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SURFACE EXPRESSION LIBRARIES 
OF HETEROMERIC RECEPTORS 



BACKGROUND OF TH E INVENTION 

This invention relates generally to recombinant 
5 expression of heteromeric receptors and, more particularly, 
to expression of such receptors on the surface of 
filamentous bacteriophage. 

Antibodies are heteromeric receptors generated by a 
vertebrates organism's immune system which bind to an 

10 antigen. The molecules are composed of two heavy and two 
light chains disulfide bonded together. Antibodies have 
the appearance of a "Y" - shaped structure and the antigen 
binding portion being located at the end of both short arms 
of the y. The region on the heavy and light chain 

15 polypeptides which corresponds to the antigen binding 
portion is known as variable region. The differences 
between antibodies within this region are primarily 
responsible for the variation in binding specificities 
between antibody molecules. The binding specificities are 

2 0 a composite of the antigen interactions with both heavy and 
light chain polypeptides. 

The immune system has the capability of generating an 
almost infinite number of different antibodies. Such a 
large diversity is generated primarily through 

2 5 recombination to form the variable regions of each chain 

and through differential pairing of heavy and light chains. 
The ability to mimic the natural immune system and generate 
antibodies that bind to any desired molecule is valuable 
because such antibodies can be used for diagnostic and 

3 0 therapeutic purposes. 



Until recently, generation of antibodies against a 
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desired molecule was accomplished only through manipulation 
of natural immune responses. Methods included classical 
immunization techniques of laboratory animals and 
monoclonal antibody production. Generation of monoclonal 
5 antibodies is laborious and time consuming. It involves a 
series of different techniques and is only performed on 
animal cells. Animal cells have relatively long generation 
times and require extra precautions to be taken compared to 
procaryotic cells to ensure viability of the cultures. 

10 A method for the generation of a large repertoire of 

diverse antibody molecules in bacteria has been described. 
Huse et al. , Science, 246, 1275-1281 (19SS) , which is 
herein incorporated by reference. The method uses the 
bacteriophage lambda as the vector. The lambda vector is 

15 a long, linear double=stranded DNA molecule. Production of 
antibodies using this vector involves the cloning of heavy 
and light chain populations of DNA sequences into separate 
vectors . The vectors are subsequently combined randomly to 
form a single vector which directs the coexpression of 

20 heavy and light chains to form antibody fragments. A 
disadvantage to this method is that undesired combinations 
of vector portions are brought together when generating the 
coexpression vector. Although these undesired combinations 
do not produce viable phage, they do however, result in a 

25 significant loss of sequences from the population and, 
therefore, a loss in diversity of the number of different 
combinations which can be obtained between heavy and light 
chains. Additionally, the size of the lambda phage gene is 
large compared to the genes that encode the antibody 

30 segments. This makes the lambda system inherently more 
difficult to manipulate as compared to other available 
vector systems. 

There thus exists a need for a method to generate 
diverse populations of heteromeric receptors which mimics 
35 the natural immune system, which is fast and efficient and 
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results in only desired combinations without loss of 
diversity. The present invention satisfies these needs and 
provides related advantages as well* 

SUMMARY O F THE INVENTION 

5 The invention relates to a plurality of cells 

containing diverse combinations of first and second DNA 
sequences encoding first and second polypeptides which form 
a heteromeric receptor, said heteromeric receptors being 
expressed on the surface of a cell, preferably one which 
10 produces filamentous bacteriophage, such as Ml 3 . Vectors, 
cloning systems and methods of making and screening the 
heteromeric receptors are also provided. 

BRTEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a schematic diagram of the two vectors 

15 used for surface expression library construction from heavy 
and light chain libraries. M13IX30 (Figure 1A) is the 
vector used to clone the heavy chain sequences (open box) . 
The single-headed arrow represents the Lac p/o expression 
sequences and the double-headed arrow represents the 

2 0 portion of M13IX30 which is to be combined with M13IX11. 
The amber stop codon and relevant restriction sites are 
also shown. M13IX11 (Figure IB) is the vector used to 
clone the light chain sequences (hatched box) , Thick lines 
represent the pseudo-wild type ( gVIII) and wild type 

25 (gVIII) gene VIII sequences. The double-headed arrow 
represents the portion of M13IX11 which is to be combined 
with M13IX30. Relevant restriction sites are also shown. 
Figure 1C shows the joining of vector population from heavy 
and light chain libraries to form the functional surface 

30 expression vector M13IXHL. Figure ID shows the generation 
of a surface expression library in a non-suppressor strain 
and the production of phage. The phage are used to infect 
a suppressor strain (Figure IE) for surface expression and 
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screening of the library. 

Figure 2 is the nucleotide sequence of M13IX30 (SEQ ID 
NO: 1) . 

Figure 3 is the nucleotide sequence of M13IX11 (SEO id 
5 NO: 2). 

Figure 4 is the nucleotide sequence of M13IX34 (SEQ ID 
NO: 3) . 

Figure 5 is the nucleotide sequence of M13IX13 (SEO ID 
NO: 4) . 

10 Figure 6 is the nucleotide sequence of M13IX60 (SEO ID 

NO: 5) . 

DETAILED DESCRIPTION OF THE INVENTION 

This invention is directed to simple and efficient 
methods to generate a large repertoire of diverse 

15 combinations of heteromeric receptors. The method is 
advantageous in that only proper combinations of vector 
portions are randomly brought together for the coexpression 
of different DNA sequences without loss of population size 
or diversity. The receptors can be expressed on the 

20 surface of cells, such as those producing filamentous 
bacteriophage, which can be screened in large numbers. The 
nucleic acid sequences encoding the receptors be readily 
characterized because the filamentous bacteriophage produce 
single strand DNA for efficient sequencing and mutagenesis 

25 methods. The heteromeric receptors so produced are useful 
in an unlimited number of diagnostic and therapeutic 
procedures . 

In one embodiment, two populations of diverse heavy 
(He) and light (Lc) chain sequences are synthesized by 
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polymerase chain reaction (PCR) . These populations are 
cloned into separate M13-based vector containing elements 
necessary for expression. The heavy chain vector contains 
a gene VIII (gvlll) coat protein sequence so that 
5 translation of the He sequences produces gVIII-Hc fusion 
proteins. The populations of two vectors are randomly 
combined such that only the vector portions containing the 
He and Lc sequences are joined into a single circular 
vector. The combined vector directs the eoexpression of 
10 both He and Lc sequences for assembly of the two 
polypeptides and surface expression on M13 . A mechanism 
also exists to control the expression of gVIII-Hc fusion 
proteins during library construction and screening. 

As used herein, the term "heteromeric receptors" 
15 refers to proteins composed of two or more subunits which 
together exhibit binding activity toward particular 
molecule. It is understood that the term includes the 
subunit fragments so long as assembly of the polypeptides 
and function of the assembled complex is retained. 
20 Heteromeric subunits include, for example, antibodies and 
fragments thereof such as Fab and <Fab) 2 portions, T cell 
receptors, integrins, hormone receptors and transmitter 
receptors . 

As used herein, the term "preselected molecule" refers 
25 to a mciscule which is chosen from a number of choices. 
The molecule can be, for example, a protein or peptide, or 
an organic molecule such as a drug. Benzodiazapam is a 
specific example of a preselected molecule. 

As used herein, the term "eoexpression" refers to the 
30 expression of two or more nucleic acid sequences usually 
expressed as separate polypeptides. For heteromeric 
receptors, the coexpressed polypeptides assemble to form 
the heteromer. Therefore, "expression elements" as used 
herein, refers to sequences necessary for the 
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transcription, translation, regulation and sorting of the 
expressed polypeptides which make up the heteromeric 
receptors. The term also includes the expression of two 
subunit polypeptides which are linked but are able to 
5 assemble into a heteromeric receptor. A specific example 
of coexpression of linked polypeptides is where He and Lc 
polypeptides are expressed with a flexible peptide or 
polypeptide linker joining the two subunits into a single 
chain. The linker is flexible enough to allow association 
10 of He and Lc portions into a functional Fab fragment. 

The invention provides for a composition of matter 
comprising a plurality of procaryotic cells containing 
diverse combinations of first and second DNA sequences 
encoding first and second polypeptides which form a 
15 heteromeric receptor exhibiting binding activity toward a 
preselected molecule, said heteromeric receptors being 
expressed on the surface of filamentous bacteriophage. 

DNA sequences encoding the polypeptides of 

heteromeric receptors are obtained by methods known to one 

2 0 skilled in the art. Such methods include, for example, 

cDNA synthesis and polymerase chain reaction (PCR) . The 
need will determine which method or combinations of methods 
is to be used to obtain the desired populations of 
sequences. Expression can be performed in any compatible 
25 vector/host system. Such systems include, for example, 
plasmids or phagemids in procaryotes such as E. coli , yeast 
systems and other eucaryotic systems such as mammalian 
cells, but will be described herein in context with its 
presently preferred embodiment, i.e. expression on the 

3 0 surface of filamentous bacteriophage. Filamentous 

bacteriophage include, for example, M13, fl and fd. 
Additionally, the heteromeric receptors can also be 
expressed in soluble or secreted form depending on the need 
and the vector/host system employed. 
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Expression of heteromeric receptors such as antibodies 
or functional fragments thereof on the surface of M13 can 
be accomplished, for example, using the vector system shown 
in Figure 1. Construction of the vectors enabling one of 
5 ordinary skill to make them are explicitly set out in 
Example I. The complete nucleotide sequences are given in 
Figures 2 and 3 (SEQ ID NOS: 1 and 2) . This system 
produces randomly combined populations of heavy (He) and 
light (Lc) chain antibody fragments functionally linked to 

10 expression elements. The He polypeptide is produced as a 
fusion protein with the M13 coat protein encoded by gene 
VIII. The gVIII-Hc fusion protein therefore anchors the 
assembled He and Lc polypeptides on the surface of M13 . 
The diversity of He and Lc combinations obtained by this 

15 system can be 5 x 10 7 or greater. Diversity of less than 5 
x 10 7 can also be obtained and will be determined by the 
need and type of heteromeric receptor to be expressed. 

Populations of He and Lc encoding sequences to be 
combined into a vector for coexpression are each cloned 

20 into separate vectors. For the vectors shown in Figure 1, 
diverse populations of sequences encoding He polypeptides 
are cloned into M13IX30 (SEQ ID NO: 1). Sequences encoding 
Lc polypeptides are cloned into M13IX11 (SEQ ID NO: 2) . 
The populations are inserted between the Xho I-Spe I or Stu 

25 I restriction enzyme sites in M13IX30 and between the Sac 
I-Xba I or Ego RV sites in M13IX11 (Figures 1A and B, 
respectively) . 

The populations of He and Lc sequences inserted into 
the vectors can be synthesized with appropriate restriction 

30 recognition sequences flanking opposite ends of the 
encoding sequences but this is not necessary. The sites 
allow annealing and ligation in-frame with expression 
elements of these sequences into a double-stranded vector 
restricted with the appropriate restriction enzyme. 

35 Alternatively, and a preferred embodiment, the He and Lc 
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sequences can be inserted into the vector without 
restriction of the ON A. This method of cloning is 
beneficial because naturally encoded restriction enzyme 
sites may be present within the sequences, thus, causing 
5 destruction of the sequence when treated with a restriction 
enzyme. For cloning without restriction, the sequences are 
treated briefly with a 3 1 to 5 1 exonuclease such as T4 DNA 
polymerase or exonuclease III . A 5 1 to 3 1 exonuclease will 
also accomplish the same function. The protruding 5' 

10 termini which remains should be complementary to single- 
stranded overhangs within the vector which remain after 
restriction at the cloning site and treatment with 
exonuclease. The exonuclease treated inserts are annealed 
with the restricted vector by methods known to one skilled 

15 in the art. The exonuclease method decreases background 
and is easier to perform. 

The vector used for He populations, M13IX30 (Figure 
1A; seq ID NO: 1) contains, in addition to expression 
elements, a sequence encoding the pseudo-wild type gVIII 

20 product downstream and in frame with the cloning sites. 
This gene encodes the wild type M13 gVIII amino acid 
sequence but has been changed at the nucleotide level to 
reduce homologous recombination with the wild type gVIII 
contained on the same vector. The wild type gVIII is 

25 present to ensure that at least some functional, non- fusion 
coat protein will be produced. The inclusion of a wild 
type gVIII therefore reduces the possibility of non-viable 
phage production and biological selection against certain 
peptide fusion proteins. Differential regulation of the 

3 0 two genes can also be used to control the relative ratio of 
the pseudo and wild type proteins. 

Also contained downstream and in frame with the 
cloning sites is an amber stop codon. The stop codon is 
located between the inserted He sequences and the gVIII 
3 5 sequence and is in frame. As was the function of the wild 
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type gVIII, the amber stop codon also reduces biological 
selection when combining vector portions to produce 
functional surface expression vectors. This is 

accomplished by using a non-suppressor (sup O) host strain 
5 because the non-suppressor strains will terminate 
expression after the He sequences but before the pseudo 
gVIII sequences. Therefore, the pseudo gVIII will 
essentially never be expressed on the phage surface under 
these circumstances. Instead, only soluble He polypeptides 

10 will be produced. Expression in a non-suppressor host 
strain can be advantageously utilized when one wishes to 
produce large populations of antibody fragments. Stop 
codons other than amber, such as opal and ochre, or 
molecular switches, such as inducible repressor elements, 

15 can also be used to unlink peptide expression from surface 
expression. 

The vector used for Lc populations. M13IX11 (SEO ID 
NO: 2) , contains necessary expression elements and cloning 
sites for the Lc sequences, Figure IB. As with M13IX30, 
2 0 upstream and in frame with the cloning sites is a leader 
sequence for sorting to the phage surface. Additionally, 
a ribosome binding site and Lac Z promoter/ operator 
elements are also present for transcription and translation 
of the DNA sequences. 

25 Both vectors contain two pairs of Mlu I -Hind III 

restriction enzyme sites (Figures 1A and B) for joining 
together the He and Lc encoding sequences and their 
associated vector sequences = Mlu I and Hind III are non- 
compatible restriction sites. The two pairs are 

30 symmetrically orientated about the cloning site so that 
only the vector portions containing the sequences to be 
expressed are exactly combined into a single vector. The 
two pairs of sites are oriented identically with respect to 
one another on both vectors and the DNA between the two 

35 sites must be homologous enough between both vectors to 
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allow annealing. This orientation allows cleavage of each 
circular vector into two portions and combination of 
essential components within each vector into a single 
circular vector where the encoded polypeptides can be 
5 coexpressed (Figure lc) . 

Any two pairs of restriction enzyme sites can be used 
so long as they are symmetrically orientated about the 
cloning site and identically orientated on both vectors. 
The sites within each pair, however, should be non- 
10 identical or able to be made differentially recognized as 
a cleavage substrate. For example, the two pairs of 
restriction sites contained within the vectors shown in 
Figure 1 are Mlu I and Hind III. The sites are 
differentially cleavable by Mlu I and Hind III 
15 respectively. One skilled in the art knows how to 
substitute alternative pairs of restriction enzyme sites 
for the Mlu I -Hind III pairs described above. Also, 
instead of two Hind in and two Mlu I sites, a Hind III and 
Not I site can be paired with a Mlu I and a Sal I site, for 
20 example. 

The combining step randomly brings together different 
He and Lc encoding sequences within the two diverse 
populations into a single vector (Figure 1C; M13IXHL-) . The 
vector sequences donated from each independent vector, 

25 M13IX30 and M13IX11, are necessary for production of viable 
phage. Also, since the pseudo gVIIl sequences are 
contained in M13IX30, coexpression of functional antibody 
fragments as Lc associated gVIII-Hc fusion proteins cannot 
be accomplished on the phage surface until the vector 

30 sequences are linked as shown in MI3IXHL. 

The combining step is performed by restricting each 
population of He and Lc containing vectors with Mlu I and 
Hind III, respectively. The 3« termini of each restricted 
vector population is digested with a 3' to 5' exonuclease 



WO 92/06204 



PCT/US91/07149 



11 

as described above for inserting sequences into the cloning 
sites. The vector populations are mixed, allowed to anneal 
and introduced into an appropriate host. A non-suppressor 
host (Figure ID) is preferably used during initial 
5 construction of the library to ensure that sequences are 
not selected against due to expression as fusion proteins. 
Phage isolated from the library constructed in a non- 
suppressor strain can be used to infect a suppressor strain 
for surface expression of antibody fragments. 

10 A method for selecting a heteromeric receptor 

exhibiting binding activity toward a preselected molecule 
from a population of diverse heteromeric receptors, 
comprising: (a) operationally linking to a first vector a 
first population of diverse DNA sequences encoding a 

15 diverse population of first polypeptides, said first vector 
having two pairs of restriction sites symmetrically 
oriented about a cloning site; (b) operationally linking to 
a second vector a second population of diverse DNA 
sequences encoding a diverse population of second 

2 0 polypeptides, said second vector having two pairs of 
restriction sites symmetrically oriented about a cloning 
site in an identical orientation to that of the first 
vector; (c) combining the vector products of step (a) and 
(b) under conditions which allow only the operational 

25 combination of vector sequences containing said first and 
second DNA sequences; (d) introducing said population of 
combined vectors into a compatible host under conditions 
sufficient for expressing said population of first and 
second DNA sequences; and (e) determining the heteromeric 

30 receptors which bind to said preselected molecule. The 
invention also provides foL determining the nucleic acid 
sequences encoding such polypeptides as well. 

Surface expression of the antibody library is 
performed in an amber suppressor strain. As described 
35 above, the amber stop codon between the Kc sequence and the 
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gVIII sequence unlinks the two components in a non- 
suppressor strain. Isolating the phage produced from the 
non-suppressor strain and infecting a suppressor strain 
will link the He sequences to the gVIII sequence during 
5 expression (Figure IE) - Cuituring the suppressor strain 
after infection allows the coexpression on the surface of 
Ml 3 of all antibody species within the library as gVIII 
fusion proteins (gvlII-Fab fusion proteins) . 
Alternatively, the DNA can be isolated from the non- 
10 suppressor strain and then introduced into a suppressor 
strain to accomplish the same effect. 

The level of expression of gVIII-Fab fusion proteins 
can additionally be controlled at the transcriptional 
level. Both polypeptides of the gVIII-Fab fusion proteins 
are under the inducible control of the Lac Z 
promoter/ operator system. Other inducible promoters can 
work as well and are known by one skilled in the art. For 
high levels of surface expression, the suppressor library 
is cultured in an inducer of the Lac Z promoter such as 
isopropylthio-B-galactoside (IPTG) . Inducible control is 
beneficial because biological selection against non- 
functional gVIII-Fab fusion proteins can be minimized by 
cuituring the library under non-expressing conditions. 
Expression can then be induced only at the time of 
screening to ensure that the entire population of 
antibodies within the library are accurately represented on 
the phage surface. Also, this can be used to control the 
valency of the antibody on the phage surface. 

The surface expression library is screened for 
30 specific Fab fragments which bind preselected molecules by 
standard affinity isolation procedures. Such methods 
include, for example, panning, affinity chromatography and 
solid phase blotting procedures. Panning as described by 
Parmley and Smith, Gene 73:305-318 (1988), which is 
3 5 incorporated herein by reference, is preferred because high 
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titers of phage can be screened easily, quickly and in 
small volumes. Furthermore, this procedure can select 
minor Fab fragments species within the population, which 
otherwise would have been undetectable, and amplified to 
5 substantially homogenous populations. The selected Fab 
fragments can be characterized by sequencing the nucleic 
acids encoding the polypeptides after amplification of the 
phage population. 

The following examples are intended to illustrate but 
10 not limit the invention. 

EXAMPLE I 

Construction . Expression and Screening of 
Antibody Fragments on the Surface of M13 

This example shows the synthesis of a diverse 
15 population of heavy (He) and light (Lc) chain antibody 
fragments and their expression on the surface of M13 as 
gene VIII-Fab fusion proteins. The expressed antibodies 
derive from the random mixing and coexpression of a He and 
Lc pair. Also demonstrated is the isolation and 
2 0 characterization of the expressed Fab fragments which bind 
benzodiazapam (BDP) and their corresponding nucleotide 
sequence . 

Isolation of mRNA and PCR Amplification of Antibody 

Fragments 

25 The surface expression library is constructed from 

mRNA isolated from a mouse that had been immunized with 
KLH-coupled benzodiazapam (BDP) . BDP was coupled to 
keyhole limpet hemocyanin (KLH) using the techniques 
described in Antibodies: A Laboratory Manual , Harlow and 

30 Lane, eds. , Cold Spring Harbor, New York (1988), which is 
incorporated herein by reference. Briefly, 10.0 milligrams 
(mg) of keyhole limpet hemocyanin and 0.5 mg of BDF with a 
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glutaryl spacer ana N-hydroxysuccinimide linker appendages. 
Coupling was performed as in Jonda et al., Science , 
241:1188 (1988) , which is incorporated herein by reference. 
The KLK-BDP conjugate was removed by gel filtration 
5 chromatography through Sephadex G-25. 

The KLK-BDP conjugate was prepared for injection into 
mice by adding 100 fig of the conjugate to 250 jul of 
phosphate buffered saline (PBS) . An equal volume of 
complete Freund's adjuvant was added and emulsified the 
entire solution for 5 minutes. Mice were injected with 300 
fil of the emulsion. Injections were given subcutaneously 
at several sites using a 21 gauge needle. A second 
immunization with BDP was given two weeks later. This 
injection was prepared as follows: 50 ug of BDP was 
diluted in 250 ill of PBS and an equal volume of alum was 
mixed with the solution. The mice were injected 
intraperitoneally with 500 /tl of the solution using a 23 
gauge needle. One month later the mice were given a final 
injection of 50 /ug of the conjugate diluted to 200 ^1 in 
PBS. This injection was given intravenously in the lateral 
tail vein using a 30 gauge needle. Five days after this 
final injection the mice were sacrificed and total cellular 
RNA was isolated from their spleens. 

Total ENA was isolated from the spleen of a single 
25 mouse immunized as described above by the method of 
Chomczynski and Sacchi, Anal. Biochem. r 162:156-159 (1S87) , 
which is incorporated herein by reference. Briefly, 
immediately after removing the spleen from the immunized 
mouse, the tissue was homogenized in 10 ml of a denaturing 
30 solution containing 4.0 M guanine isothiocyanate, 0.25 M 
sodium citrate at pH 7.0, and 0.1 M 2-mercaptoethanol using 
a glass homogenizer. One ml of sodium acetate at a 
concentration of 2 M at pH 4.0 was mixed with the 
homogenized spleen. One ml of saturated phenol was also 
35 mixed with the denaturing solution containing the 
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homogenized spleen. Two mi of a chloroform: isoamyl alcohol 
(24:1 v/v) mixture was added to this homogenate. The 
homogenate was mixed vigorously for ten seconds and 
maintained on ice for 15 minutes. The homogenate was then 
5 transferred to a thick-walled 50 ml polypropylene 
centrifuge tube (Fisher Scientific Company. Pittsburgh. 
PA). The solution was centrifuged at 10,000 x g for 20 
minutes at 4°C. The upper RNA-containing aqueous layer was 
transferred to a fresh 50 ml polypropylene centrifuge tube 

10 and mixed with an equal volume of isopropyl alcohol. This 
solution was maintained at -20 e C for at least one hour to 
precipitate the RNA. The solution containing the 
precipitated RNA was centrifuged at 10,000 x g for twenty- 
minutes at 4"C. The pelleted total cellular RNA was 

15 collected and dissolved in 3 ml of the denaturing solution 
described above. Three mis of isopropyl alcohol was added 
to the resuspended total cellular RNA and vigorously mixed. 
This solution was maintained at -20 *C for at least 1 hour 
to precipitate the RNA. The solution containing the 

20 precipitated RNA was centrifuged at 10,000 x g for ten 
minutes at 4°C. The pelleted RNA was washed once with a 
solution containing 75% ethanol. The pelleted EL . was 
dried under vacuum for 15 minutes and then resuspended in 
dimethyl pyrocarbonate (DEPC) treated (DEPC-H-O) H 2 o. 

25 Poly A* RNA for use in first strand cDNA synthesis was 

prepared from the above isolated total RNA using a spin- 
column kit (Pharmacia, Piscataway, NJ) as recommended by 
the manufacturer. The basic methodology has been described 
by Aviv and Leder, Proc. Natl. Acad. Sci... USA , 69:1408- 

3 0 1412 (1972) , which is incorporated herein by reference. 
Briefly, one half of the total RNA isolated from a single 
immunized mouse spleen prepared as described above was 
resuspended in one mi of DEPC-treated dK 2 0 and maintained at 
65 °C for five minutes. One ml of 2x high salt loading 

3 5 buffer (100 mM Tris-HCL at pH 7,5, 1 M sodium chloride, 2.0 
mM disodium ethylene diamine tetraacetic acid (EDTA) at pK 
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8,0, and 0=2% sodium dodecyl sulfate (SDS) ) was added to 
the resuspended RNA and the mixture was allowed to cool to 
room temperature. The mixture was then applied to an 
oligo-dT (Collaborative Research Type 2 or Type 3 Bedford, 
5 MA) column that was previously prepared by washing the 
oligo-dT with a solution containing 0.1 M sodium hydroxide 
and 5 mM EDTA and then equilibrating the column with BEPC- 
treated dK 2 0. The eiuate was collected in a sterile 
polypropylene tube and reapplied to the same column after 

10 heating the eiuate for 5 minutes at 65 °C. The oligo dT 
column was then washed with 2 mi of high salt loading 
buffer consisting of 50 mM Tris-HCL at pH 7.5, 500 mM 
sodium chloride. 1 mM EDTA at pH S.O and 0.1% SDS. The 
oligo dT column was then washed with 2 mi of 1 X medium 

15 salt buffer (50 mM Tris-HCL at pH 7.5, 100 mM sodium 
chloride, 1 mM EDTA at pH 8.0 and 0.1% SDS). The mRNA was 
eluted with 1 ml of buffer consisting of 10 mM Tris-HCL at 
pH 7.5, 1 mM EDTA at pH 8.0 and 0.05% SDS. The messenger 
RNA was purified by extracting this solution with 

2 0 phenol/ chloroform followed by a single extraction with 100% 

chloroform, ethanol precipitated and resuspended in DEPC 
treated dH,0. 

In preparation for PCR amplification, mRNA was used as 
a template for cDNA synthesis. In a typical 250 "1 reverse 
25 transcription reaction mixture, 5-10 fig of spleen mRNA in 
water was first annealed with 500 ng (0.5 pmol) of either 
the 3« V H primer (primer 12. Table I) or the 3> V L primer 
(primer 9, Table II) at 65 °C for 5 minutes. Subsequently, 
the mixture was adjusted to contain 0.8 mM dATP, 0.8 mM 

3 0 dCTP, 0.8 mM dGTP, 0.8 mM dTTP, 100 mM Tris-HCL (pH 8.6), 

10 mM MgCl 2 , 40 mM KC1, and 20 mM 2 -ME . Moloney-Murine 
Leukemia Virus (Bethesda Research Laboratories (BRL) , 
Gaithersburg, MD) Reverse transcriptase, 26 units, was 
added and the solution was incubated for 1 hour at 40°C. 
35 The resultant first strand cDNA was phenol extracted, 
ethanol precipitated and then used in the polymerase chain 
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reaction (PGR) procedures described below for amplification 
of heavy and light chain sequences. 



Primers used for amplification of heavy chain Fd 
fragments for construction of the M13IX30 library is shown 
5 in Table I. Amplification was performed in eight separate 
reactions, as described by Saifci et al. , Science, 239:487- 
491 (1988) , which is incorporated herein by reference, each 
reaction containing one of the 5« primers (primers 2 to 9; 
SEQ ID NOS: 7 through 14, respectively) and one of the 3* 

10 primers (primer 12; SEQ ID NO: 17) listed in Table I. The 
remaining 5 1 primers, used for amplification in a single 
reaction, are either a degenerate primer (primer 1; SEQ ID 
NO: 6) or a primer that incorporates inosine at four 
degenerate positions (primer 10; SEQ ID NO: 15). The 

15 remaining 3 s primer (primer 11; SEQ ID NO: 16) was used to 
construct Fv fragments. The underlined portion of the 5' 
primers incorporates an Xho I site and that of the 3 » 
primer an Spe I restriction site for cloning the amplified 
fragments into the M13IX30 vector in a predetermined 

2 0 reading frame for expression. 







TABLE I 
HEAVY CHAIN PRIMERS 




1) 


5' 


CC G G T 
- AGGT A CT CTCGAGTC GG - 
GA A T A 


3 : 


2) 


5' 


- AGGTCCAGCTGCTCGAGTCTGG - 


3 


3) 


5' 


- AGGTCCAGCTGCTCGA5TCAGG - 


3 


4) 


5' 


- AGGTCCAGCTTCTCGAGTCTGG - 


3 


5) 


5' 


- AGGTCCAGCTTCTCGAGTCAGG - 


3 


6) 


5 ! 


- AGGTCCAACTGCTCGAGTCTGG - 


3 


7) 


5' 


- AGGTCCAACTGCTCGAGTCAGG - 


3 


8) 


5 1 


- AGGTCCAACTTCJTCGAGTCTGG - 


3 
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9) 5 1 - AGGTCCAACT TCTCGAGT CAGG - 3' 
T 

10) 5' - AGGTIIAICTICTCGAGTC GG - 3 • 

A 

11) 5 ! - CTATTAACTACTAACGGTAACAGT - 

GGTGCCTTGCCCCA - 3 s 

12) 5' - AGGCTTACTAGTACAATCCCTGG - 

GCACAAT - 3' 

Primers used for amplification of mouse kappa light 
chain sequences for construction of the M13IX11 library are 
shown in Table II. These primers were chosen to contain 
restriction sites which were compatible with vector and not 
present in the conserved sequences of the mouse light chain 
mRNA. Amplification was performed as described above in 
five separate reactions, each containing one of the 5' 
primers (primers 3 to 7; SEQ ID NOS: 20 through 24, 
respectively) and one of the 3 • primers (primer 9 ; SEQ ID 
NO: 26) listed in Table II. The remaining 3' primer 
(primer 8; SEQ ID NO: 25) was used to construct Fv 
fragments. The underlined portion of the 5' primers 
depicts a Sac I restriction site and that of the 3 • primers 
an Xba I restriction site for cloning of the amplified 
fragments into the M13IX11 vector in a predetermined 
reading frame for expression. 

TABLE II 
LIGHT CHAIN PRIMERS 

■ CCAGTTCCGAGCTCGTTGTGACTCAGGAATCT - 3 1 
CCAGTTCCGAGCTCGTGTTGACGCAGCCGCCC - 3' 
CCAGTTCCGAGCTCGTGCTCACCCAGTCTCCA - 3 » 
CCAGTTCCGAGCTCCAGATGACCCAGTCTCCA - 3 * 
CCAGATGTGAGCTCGTGATGACCCAGACTCCA - 3' 
CCAGATGTGAGCTCGTCATGACCCAGTCTCCA - 3 1 
CCAGTTCCGAGCTCGTGATGACACAGTCTCCA - 3' 

gcagcattctagagtttcagctccagcttgcc - 3 » 
gcgccgtctagaattaacactcattcctgttgaa - 3' 





1) 


5 




2) 


5 




3) 


5 


30 


4) 


5 




5) 


5 




6) 


5 




7) 


5 




8) 


5 


35 


9) 


5 
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FCR amplification for heavy and light chain fragments 
was performed in a 100 ill reaction mixture containing the 
above described products of the reverse transcription 
reaction («5y*g of the cDNA-RNA hybrid), 300 nmol of 3' V H 
5 primer (primer 12, Table I; SEQ ID NO: 17), and one of the 
5' V H primers (primers 2-9. Table I; SEQ ID NOS: 7 through 
14, respectively) for heavy chain amplification, or, 300 
nmol of 3' V L primer (primer 9, Table II; SEQ ID NO: 26), 
and one of the 5' V L primers (primers 3-7, Table II; SEQ ID 

10 NOS: 20 through 24, respectively) for each light chain 
amplification, a mixture of dNTPs at 200 mM, 50 mM KC1, 10 
mM Tris-HCl (pH 8.3) , 15 mM MgCl,, 0.1% gelatin, and 2 units 
of Thermus aquaticus DNA polymerase. The reaction mixture 
was overlaid with mineral oil and subjected to 40 cycles of 

15 amplification. Each amplification cycle involved 

denaturation at 92 "C for 1 minute, annealing at 52 "C for 2 
minutes, and elongation at 72 °C for 1.5 minutes. The 
amplified samples were extracted twice with phenol/CHCl 3 and 
once with CKC1 3 , ethanol-precipitated, and stored at -70 "C 

20 in 10 mM Tris-rHCl, pH 7.5 1 mM EDTA. The resultant 
products were used in constructing the M13IX30 and M13IX11 
libraries (see below) . 

Vector Construction 

Two M13-based vectors, M13IX30 (SEQ ID NO: 1) and 
25 M13IX11 (SEQ ID NO: 2) , were constructed for the cloning 
and propagation of He and Lc populations of antibody 
fragments, respectively. The vectors were constructed to 
facilitate the random joining and subsequent surface 
expression of antibody fragment populations. 

30 M13IX30 (SEQ ID NO: 1), or the He vector, was 

constructed to harbor diverse populations of He antibody 
fragments. M13mpl9 (Pharmacia, Piscataway, NJ) was the 
starting vector. This vector was modified to contain, in 
addition to the encoded wild type M13 gene VIII: (1) a 
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pseudo-wild type gene VIII sequence with an amber stop 
codon between it and the restriction sites for cloning 
oligonucleotides; (2) Stu I restriction site for insertion 
of sequences by hybridization and, Spe I and Xho I 
5 restriction sites in-frame with the pseudo-wild type gene 
VIII for cloning He sequences; (3) sequences necessary for 
expression, such as a promoter, signal sequence and 
translation initiation signals; (4) two pairs of Hind III- 
Mlu I sites for random joining of He and Lc vector 
10 portions, and (5) various other mutations to remove 
redundant restriction sites and the amino terminal portion 
of Lac Z. 

Construction of M13IX30 was performed in four steps. 
In the first step, an M13-based vector containing the 

15 pseudo gVIII and various other mutations was constructed, 
M13IX01F. The second step involved the construction of a 
small cloning site in a separate M13mpl8 vector to yield 
M13IX03. This vector was then expanded to contain 
expression sequences and restriction sites for He sequences 

20 to form M13IX04B. The fourth and final step involved the 
incorporation of the newly constructed sequences in 
M13IX04B into Ml 31X0 IF to yield Mi 3 1X30. 

Construction of M13IX01F first involved the generation 
of a pseudo wild-type gVIII sequence for surface expression 

25 of antibody fragments. The pseudo-wild type gene encodes 
the identical amino acid sequence as that of the wild type 
gene; however, the nucleotide sequence has been altered so 
that only 63% identity exists between this gene and the 
encoded wild type gene VIII. Modification of the gene VIII 

3 0 nucleotide sequence used for surface expression reduces the 
possibility of homologous recombination with the wild type 
gene VIII contained on the same vector. Additionally, the 
wild type M13 gene VIII was retained in the vector system 
to ensure that at least some functional, non-fusion coat 

35 protein would be produced. The inclusion of wild type gene 
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VIII facilitates the growth of phage under conditions where 
there is surface expression of the polypeptides and 
therefore reduces the possibility of non-viable phage 
production from the fusion genes. 

5 The pseudo-wild type gene VIII was constructed by 

chemically synthesizing a series of oligonucleotides which 
encode both strands of the gene. The oligonucleotides are 
presented in Table III. 
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TABLE III 

Pseudo-Wild Type Gene VI II Oligonucleotide Series 



Top Strand 
Oligonucleotides 



Sequence (5' to 3'i 



VIII 05 
VIII 06 
VIII 07 



GATCC TAG GCT GAA GGC 
GAT GAG CCT GCT AAG GCT 
GC 

A TTC AAT AGT TTA CAG 
GCA AGT GCT ACT GAG TAC 
A 

TT GGC TAC GCT TGG GCT 
ATG GTA GTA GTT ATA GTT 
GGT GCT ACC ATA GGG ATT 
AAA TTA TTC AAA AAG TT 
T ACG AGC AAG GCT TCT 
TA 



Bottom strand 
Oligonucleotides 



VIII 10 
VIII 11 
VIII 12 



AGC TTA AGA AGC CTT GCT 
CGT AAA CTT TTT GAA TAA 
TTT 

AAT CCC TAT GGT AGC ACC 
AAC TAT AAC TAC TAC CAT 
AGC CCA AGC GTA GCC AAT 
GTA CTC AGT AGC ACT TG 
C CTG TAA ACT ATT GAA 
TGC AGC CTT AGC AGG GTC 
ATC GCC TTC AGC CTA G 



Except for the terminal oligonucleotides VIII 03 (SEQ 
ID NO: 27) and VIII 08 (SEQ ID NO: 32) , the above 
oligonucleotides (oligonucleotides VIII 04-07 (SEQ ID NO 3 : 
28 through 31, respectively) and VIII 09-12 (SEQ ID NOS: 33 
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through 36, respectively}) were mixed at 200 ng each in 10 
Ml final volume, phosphorylated with T4 polynucleotide 
Kinase (Pharmacia) and 1 mM ATP at 37 'C for 1 hour, heated 
to 70 °C for 5 minutes, and annealed into double-stranded 
5 form by heating to 65 °C for 3 minutes, followed by cooling 
to room temperature over a period of 30 minutes. The 
reactions were treated with 1.0 U of T4 DNA ligase (BRL) 
and 1 mM ATP at room temperature for 1 hour, followed by 
heating to 70 °C for 5 minutes. Terminal oligonucleotides 

10 were then annealed to the ligated oligonucleotides. The 
annealed and ligated oligonucleotides yielded a double- 
stranded DNA flanked by a Bam HI site at its 5' end and by 
a Kind III site at its 3 ' end. A translational stop codon 
(amber) immediately follows the Bam HI site. The gene VIII 

15 sequence begins with the codon GAA (Glu) two codons 3 1 to 
the stop codon. The double=stranded insert was cloned in 
frame with the Eco RI and Sac I sites within the M13 
polyl inker. To do so, M13mpl9 was digested with Bam HI 
(New England Eiolabs, Beverley, MA) and Hind III (New 

20 England Biolabs) and combined at a molar ratio of 1:10 with 
the double-stranded insert. The ligations were performed 
at room temperature overnight in IX ligase buffer (50 mM 
Tris-HCl, pH 7.8, 10 mM MgCl 2 , 20 mM DTT, 1 mM ATP, 50 /zg/ml 
BSA) containing 1.0 U of T4 DNA ligase (New England 

25 Biolabs) . The ligation mixture was transformed into a host 
and screened for positive clones using standard procedures 
in the art. 

Several mutations were generated within the construct 
to yield functional M13IX01F. The mutations were generated 

30 using the method of Kunkel et al . , Meth. Enzymol. 154:367- 
382 (1987), which is incorporated herein by reference, for 
site-directed mutagenesis. The reagents, strains and 
protocols were obtained from a Bio Rad Mutagenesis kit (Bio 
Rad, Richmond, CA) and mutagenesis was performed as 

35 recommended by the manufacturer. 
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Two Fok I sites were removed from the vector as well 
as the Hind III site at the end of the pseudo gene VIII 
sequence using the mutant oligonucleotides 5*- 
CATTTTTGCAGATGGCTTAGA- 3 ' (SEQ ID NO: 37) and 5 ' - 
5 TAGCATTAACGTCCAATA-3 • (SEQ ID NO: 38). New Hind III and 
Mlu I sites were also introduced at position 3919 and 3951 
of M13IX01F. The oligonucleotides used for this 

mutagenesis had the sequences 5 1 - 
ATATATTTTAGTAAGCTTCATCTTCT-3 * (SEQ ID NO: 39) and 5'- 

10 GACAAAGAACGCGTGAAAACTTT-3 ' (SEQ ID NO: 40), respectively. 
The amino terminal portion of Lac Z was deleted by 
oligonucleotide-directed mutagenesis using the mutant 
oligonucleotide 5 1 -GCGGGCCTCTTCGCTATTGCTTAAGAAGCCTTGCT=3 1 
(SEQ ID NO: 41) . In constructing the above mutations, all 

15 changes made in a M13 coding region were performed such 
that the amino acid sequence remained unaltered. The 
resultant vector, M13IX01F, was used in the final step to 
construct M13IX30 (see below) . 

In the second step, M13mpl8 was mutated to remove the 
20 5' end of Lac Z up to the Lac i binding site and including 
the Lac Z ribosome binding site and start codon. 
Additionally, the poiyiinker was removed and a Mlu I site 
was introduced in the coding region of Lac Z. A single 
oligonucleotide was used for these mutagenesis and had the 
25 sequence 5 ! -AAACGACGGCCAGTGCCAAGTGACGCGTGTGAAATTGTTATCC-3 » 
(SEQ ID NO: 42) . Restriction enzyme sites for Hind III and 
Eco RI were introduced downstream of the Mlu I site using 
the oligonucleotide 5 ' -GGCGAaAGGGAAtTCtGCAAGGCGATTAAGCTTGGG 
TAACGCC-3 1 (SEQ ID NO. 43). These modifications of M13mpl8 
30 yielded the precursor vector M13IXQ3. 

The expression sequences and cloning sites were 
introduced into M13IX03 by chemically synthesizing a series 
of oligonucleotides which encode both strands of the 
desired sequence. The oligonucleotides are presented in 

35 Table IV. 
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TABLE IV 
M13IX3Q Oligonucleotide Series 



Top Strand 

Oligonucleotides Sequence (5* to 3') 

084 GGCGTTACCCAAGCTTTGTACATGGAGAAAATAAAG 

027 TGAAACAAAGCACTATTGCACTGGCACTCTTACCGT 
TACCGT 

028 TACTGTTTACCCCTGTGACAAAAGCCGCCCAGGTCC 
AGCTGC 

029 TCGAGTCAGGCCTATTGTGCCCAGGGATTGTACTAG 
TGGATCCG 



Bottom 

Oligonucleotides sequence fb 8 to 3 8 ) 

085 TGGCGAAAGGGAATTCGGATCCACTAGTACAATCCCTG 

031 GGCACAATAGGCCTGACTCGAGCAGCTGGACCAGGGCG 
GCTT 

032 . TTGTCACAGGGGTAAACAGTAACGGTAACGGTAAGTGT 

GCCA 

033 GTGCAATAGTGCTTTGTTTCACTTTATTTTCTCCATGT 
ACAA 



The above oligonucleotides of Table IV, except for the 
terminal oligonucleotides 084 (SEQ ID NO: 44) and 085 (SEQ 
ID NO: 48) , were mixed, phosphory lated , annealed and 
ligated to form a double-stranded insert as described in 

25 Example I. However, instead of cloning directly into the 
intermediate vector the insert was first amplified by PCR. 
•The terminal oligonucleotides were used as primers for PCR. 
Oligonucleotide 084 (SEQ ID NO: 44) contains a Hind III 
site, 10 nucleotides internal to its 5' end and 

30 oligonucleotide 085 (SEQ ID NO; 48) has an Eco RI site at 
its 5' end. Following amplification, the products were 
restricted with Hind III and Eco RI and ligated, as 
described in Example I, into the poly linker of M13mpl8 
digested with the same two enzymes. The resultant double 
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stranded insert contained a ribosome binding site, a 
translation initiation codon followed by a leader sequence 
and three restriction enzyme sites for cloning random 
oligonucleotides (Xho I, Stu I, Spe I) . The intermediate 
5 vector was named M13IX04. 

During cloning of the double-stranded insert, it was 
found that one of the GCC codons in oligonucleotides 028 
and its complement in 031 was deleted. Since this deletion 
did not affect function, the final construct is missing one 

10 of the two GCC codons. Additionally, oligonucleotide 032 
(SEQ ID NO: 50) contained a GTG codon where a GAG codon was 
needed. Mutagenesis was performed using the 

oligonucleotide 5 8 -TAACGGTAAGAGTGCCAGTGC-3 ■ (SEQ ID NO: 52) 
to convert the codon to the desired sequence. The 

15 resultant vector is named M13IX04E. 

The third step in constructing M13IX30 involved 
inserting the expression and cloning sequences from 
M13IX04B upstream of the pseudo wild-type gVIll in 
Ml 31X0 IF. This was accomplished by digesting M13IX04B with 

20 Dra III and Bam HI and gel isolating the 700 base pair 
insert containing the sequences of interest. M13IX01F was 
likewise digested with Dra III and Bam HI. The insert was 
combined with the double digested vector at a molar ratio 
of 1:1 and ligated as described in Example I. The sequence 

25 of the final construct M13IX30, is shown in Figure 2 (SEQ 
ID NO: 1) . Figure 1A also shows M13IX3 0 where each of the 
elements necessary for surface expression of He fragments 
is marked. It should be noted during modification of the 
vectors, certain sequences differed from the published 

3 0 sequence of M13mpl8. The new sequences are incorporated 
into the sequences recorded herein. 

Ml 31X11 (SEQ ID NO: 2), or the Lc vector, was 
constructed to harbor diverse populations of Lc antibody 
fragments. This vector was also constructed from M13mpl9 
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and contains: (1) sequences necessary for expression, such 
as a promoter, signal sequence and translation initiation 
signals: (2) Eco RV restriction site for insertion of 
sequences by hybridization and Sac I and Xba I restriction 
5 sites for cloning of Lc sequences; (3) two pairs of Hind 
III-Mlu I sites for random joining of He and Lc vector 
portions, and (4) various other mutation to remove 
redundant restriction sites. 

The expression, translation initiation signals, 
10 cloning sites, and one of the Mlu I sites were constructed 
by annealing of overlapping oligonucleotides as described 
above to produce a double-stranded insert containing a 5' 
Eco RI site and a 3' Hind III site. The overlapping 
oligonucleotides are shown in Table V and were ligated as 
15 a u able-stranded insert between the Eco RI and Hind III 
sites of M13mpl8 as described for the expression sequences 
inserted into M13IX03 . The ribosome binding site (AGGAGAC) 
is located in oligonucleotide 015 and the translation 
initiation codon (ATG) is the first three nucleotides of 
20 oligonucleotide 016 (SEQ ID NO: 55) . 
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TABLE V 

Oligonucleotide Series for Construction of 
Translation Signals in M13IX11 

Oligonucleotide Sequence f5' to 3", 

5 082 CACC TTCATG AATTC GGC AAG 

GAGACA GTCAT 

015 AATT C GCC AAG GAG ACA GTC AT 

016 AATG AAA TAC CTA TTG CCT ACG 
GCA GCC GCT GGA TTG TT 

10 017 ATTA CTC GCT GCC CAA CCA GCC 

ATG GCC GAG CTC GTG AT 

018 GACC CAG ACT CCA GATATC CAA 
CAG GAA TGA GTG TTA AT 

019 TCT AGA ACG CGT C 

15 083 TTCAGGTTGAAGC TTA CGC GTT 

CTA GAA TTA ACA CTC ATT 
CCTGT 

021 TG GAT ATC TGG AGT • CTG GGT 

CAT CAC GAG CTC GGC CAT G 
20 022 GC TGG TTG GGC AGC GAG TAA 

TAA CAA TCC AGC GGC TGC C 

023 GT AGG CAA TAG GTA TTT CAT 

TAT GAC TGT CCT TGG CG 

Oligonucleotide 017 (SEQ ID NO: 56) contained a Sac I 
25 restriction site 67 nucleotides downstream from the ATG 
codon. The naturally occurring Eco RI site was removed and 
new Eco RI and Hind III sites were introduced downstream 
from the Sac I. Oligonucleotides 5'- 

TGACTGTCTCCTTGGCGTGTGAAATTGTTA-3 ' (SEQ ID NO: 63) and 5'- 
30 TAACACTCATTCCGGATGGAATTCtGGAGTCTGGGT- 3 1 (SEQ ID NO: 64) 
were used to generate each of the mutations, respectively. 
The Lac Z ribosome binding site was removed when the 
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original Eco RI site in M13mpl9 was mutated. Additionally, 
when the new Eco RI and Hind III sites were generated, a 
spontaneous 100 bp deletion was found just 3 ' to these 
sites. Since the deletion does not affect the function, it 
5 was retained in the final vector. 

In addition to the above mutations, a variety of other 
modifications were made to incorporate or remove certain 
sequences. The Hind III site used to ligate the double- 
stranded insert was removed with the oligonucleotide 5'- 

10 GCCAGTGCCAAGTGACGCGTTCTA-3 • (SEQ ID NO: 65). Second Hind 
III and Mlu I sites were introduced at positions 3922 and 
3952, respectively, using the oligonucleotides 5 ' - 
ATATATTTTAGTAAGCTTCATCTTCT-3 • {SEQ ID NO: 66) for the Hind 
III mutagenesis and 5 ' — GACAAAGAACGCGTGAAAACTTT-3 * (SEQ ID 

15 NO: 67) for the Mlu I mutagenesis. Again, mutations within 
the coding region did not alter the amino acid sequence. 

The sequence of the resultant vector, M13IX11, is 
shown in Figure 3 (SEQ ID NO: 2) . Figure IB also shows 
M13IX11 where each of the elements necessary for producing 
20 a surface expression library between Lc fragments is 
marked. 

T.ibrarv Construction 

Each population of He and Lc sequences synthesized by 
PGR above are separately cloned into Ml 3 1X30 and M13IX11, 
2 5 respectively, to create He and Lc libraries. 

The He and Lc products (5 jig) are mixed, ethanol 
precipitated and resuspended in 20 /il of NaOAc buffer (33 
mM Tris acetate, pH 7.9. 10 mM Mg-acetate, 66 mM K-acetate, 
0.5 mM DTT) . Five units of T4 DNA polymerase is added and 
30 the reactions incubated at 30 'C for 5 minutes to remove 3 f 
termini by exonuclease digestion. Reactions are stopped by 
heating at 70 'C for 5 minutes. M13IX30 is digested with 
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Stu I and M13IX11 is digested with Eco RV. Both vectors 
are treated with T4 DNA polymerase as described above and 
combined with the appropriate PCR products at a 1:1 molar 
ratio at 10 ng/nl to anneal in the above buffer at room 
5 temperature overnight. DNA from each annealing is 
electroporated into MK30-3 (Boehringer, Indianapolis. IN) , 
as described below, to generate the He and Lc libraries. 

E ■ coli MK30-3 is electroporated as described by Smith 
et al. , Focus 12:38-40 (1990) which is incorporated herein 

10 by reference. The cells are prepared by inoculating a 
fresh colony of MK30-3 into 5 mis of SOB without magnesium 
(20 g bacto-tryptone, 5 g bacto-yeast extract, 0.584 g 
NaCl, 0.186 g KCl, dH 2 o to 1,000 mis) and grown with 
vigorous aeration overnight at 37 °C. SOB without magnesium 

15 (500 ml) is inoculated at 1:1000 with the overnight culture 
and grown with vigorous aeration at 37 °C until the OD 550 is 
0.8 (about 2 to 3 h) . The cells are harvested by 
centrifugation at 5,000 rpm (2,600 x g) in a GS3 rotor 
(Sorvall, Newtown, CT) at 4'C for 10 minutes, resuspended 

20 in 500 ml of ice-cold 10% (v/v) sterile glycerol, 
centrifuged and resuspended a second time in the same 
manner. After a third centrifugation, the cells are 
resuspended in 10% sterile glycerol at a final volume of 
about 2 ml, such that the OD 550 of the suspension was 200 to 

25 300. Usually, resuspension is achieved in the 10% glycerol 
that remained in the bottle after pouring off the 
supernate. Cells are frozen in 40 jul aliquots in 
microcentrifuge tubes using a dry ice-ethanol bath and 
stored frozen at -70 *C. 

3 0 Frozen cells are electroporated by thawing slowly on 

ice before use and mixing with about 10 pg to 500 ng of 
vector per 40 nl of cell suspension. A 40 /jI aliquot is 
placed in an 0.1 cm electroporation chamber (Bio-Rad, 
Richmond, CA) and pulsed once at 0'C using 4 kn parallel 

35 resistor 25 fiF, 1,88 KV, which gives a pulse length (r) of 
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"4 ms. A 10 til aliquot of the pulsed cells are diluted 
into 1 ml SOC (98 mis SOB plus 1 ml of 2 M MgCl 2 and 1 mi of 
2 M glucose) in a 12- x 75-mm culture tube, and the culture 
is shaken at 37 "C for 1 hour prior to culturing in 
5 selective media, (see below) . 

Each of the libraries are cultured using methods known 
to one skilled in the art. Such methods can be found in 
Sanbrook et al., Molecular Cloning: A Laboratory Manuel, 
Cold Spring Harbor Laboratory, Cold Spring Harbor, 1989, 

10 and in Ausubel et al., Current Protocols in Molecular 
Biology. John Wiley and Sons, New York, 1989, both of which 
are incorporated herein by reference. Briefly, the above 
1 ml library cultures are grown up by diluting 50-fold into 
2XYT media (16 g tryptone, 10 g yeast extract, 5 g NaCl) 

15 and culturing at 37 "C for 5=8 hours. The bacteria are 
pelleted by centrifugation at 10,000 x g. The supernatant 
containing phage is transferred to a sterile tube and 
stored at 4"C. 

Double strand vector DNA containing He and Lc antibody 

20 fragments are isolated from the cell pellet of each 
library. Briefly, the pellet is washed in TE (10 mM Tris, 
pH 8.0, 1 mM EDTA) and recollected by centrifugation at 
7,000 rpm for 5' in a Sorval centrifuge (Newtown, CT) . 
Pellets are resaspended in 6 mis of 10% Sucrose, 50 mM 

25 Tris, pH 8.0. 3 . 0 ml of 10 mg/Ml lysozyne is added and 
incubated on ice for 20 minutes. 12 mis of 0.2 M NaOH, 1% 
SDS is added followed by 10 minutes on ice. The 
suspensions are then incubated on ice for 20 minutes after 
addition of 7.5 mis of 3 M NaOAc, pH 4.6. The samples are 

30 centrifuged at 15,000 rpm for 15 minutes at 4°C, RNased and 
extracted with phenol/chloroform, followed by ethanol 
precipitation. The pellets are resuspended, weighed and an 
equal weight of CsCl 2 is dissolved into each tube until a 
density of 1.60 g/ml is achieved. EtBr is added to 600 

35 jug/mi and the double-stranded DNA is isolated by 
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equilibrium centrifugation in a TV-1665 rotor (Scrval) at 
50,000 rpm for 6 hours. These DNAs from each right and 
left half sublibrary are used to generate forty libraries 
in which the right and left halves of the randomized 
5 oligonucleotides have been randomly joined together. 

The surface expression library is formed by the random 
joining of the Kc containing portion of M13IX30 with the Lc 
containing portion of M13IX11. The DNAs isolated from each 
library was digested separately with an excess amount of 

10 restriction enzyme. The Lc population (5 fig) is digested 
with Hind III. The He (5 /xg) population is digested with 
Mlu I. The reactions are stopped by phenol/ chloroform 
extraction followed by ethanoi precipitation. The pellets 
are washed in 70% ethanoi and resuspended in 20 fil of NaOAc 

15 buffer. Five units of T4 DNA polymerase (Pharmacia) is 
added and the reactions incubated at 30°C for 5 minutes. 
Reactions are stopped by heating at 70 °C for 5 minutes. 
The He and Lc DNAs are mixed to a final concentration of 10 
ng each vector//il and allowed to anneal at room temperature 

20 overnight. The mixture is electroporated into MK30-3 cells 
as described above. 

Screening of Surface Expression Libraries 

Purified phage are prepared from 50 ml liquid cultures 
of XL1 Blue™ cells (Stratagene, La Jolla, CA) which had 

25 been infected at a m.o.i. of 10 from the phage stocks 
stored at 4°C. The cultures are induced with 2 mM IPTG. 
Supernatants are cleared by two centr if ugations , and the 
phage are precipitated by adding 1/7.5 volumes of PEG 
solution (25% PEG-8000, 2.5 M NaCl) , followed by incubation 

30 at 4°C overnight. The precipitate is recovered by 
centrifugation for 90 minutes at 10,-000 x g. Phage pellets 
are resuspended in 25 ml of 0.01 M Tris-HCl, pH 7.6, 1.0 mM 
EDTA, and 0.1% Sarkosyl and then shaken slowly at room 
temperature for 30 minutes. The solutions are adjusted to 
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0.5 M NaCl and to a final concentration of 5% polyethylene 
glycol. After 2 hours at 4°C, the precipitates containing 
the phage are recovered by centrifugation for 1 hour at 
15,000 X g. The precipitates are resuspended in 10 ml of 
5 NET buffer (0.1 M NaCl, 1.0 mM EDTA, and 0.01 M Tris-HCi, 
pH 7.6), mixed well, and the phage repelleted by 
centrifugation at 170,000 X g for 3 hours. The phage 
pellets are resuspended overnight in 2 ml of NET buffer and 
subjected to cesium chloride centrifugation for 18 hours at 
10 110,000 X g (3. 86 g of cesium chloride in 10 ml of buffer) . 
Phage bands are collected, diluted 7-hold with NET buffer, 
recentrifuged at 170.000 X g for 3 hours, resuspended, and 
stored at 4 ° C in 0.3 ml of NET buffer containing 0.1 mM 
sodium azide. 

15 The BDF used for panning on streptavidin coated dishes 

is first biotinylated and then absorbed against UV- 
inactivated blocking phage (see below) . The biotinylating 
reagents are dissolved in dimethyl fcrmamide at a ratio of 
2.4 mg solid NHS-SS-Biotin (suif osuccinimidyi 2- 

20 (biotinamido) ethyl-1, 3 1 -dithiopropionate ; Pierce, Rockford, 
IL) to 1 ml solvent and used as recommended by the 
manufacturer. Small-scale reactions are accomplished by 
mixing 1 ul dissolved reagent with 43 /zl of 1 mg/ml BDP 
diluted in sterile bicarbonate buffer (0.1 M NaHC0 3 , pH 

25 8.6). After 2 hours at 25°C, residual biotinylating 
reagent is reacted with 500 ill 1 M ethanolamine (pH 
adjusted to 9 with HC1) for an additional 2 hours. The 
entire sample is diluted with 1 ml TBS containing 1 mg/ml 
BSA, concentrated to about 50 Ml on a Centricon 30 ultra - 

30 filter (Am icon) , and washed on the same filter three times 
with 2 ml TBS and once with 1 mi TBS containing 0.02% NaN 3 
and 7 x 10 12 UV- inactivated blocking phage (see below) ; the 
final retentate (60-80 *il> is stored at 4 "C. BDP 
biotinylated with the NHS-SS-Biotin reagent is linked to 

35 biotin via a disulf ide-containing chain. 
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in/- irradiated Ml 3 phage are used for blocking any 
biotinyiafced BDP which fortuitously binds filamentous phage 
in general. Ml3mp8 (Messing and Vieira, Gene 19: 262-276 
(1982) , which is incorporated herein by reference) is 
5 chosen because it carries two amber mutations, which ensure 
that the few phage surviving irradiation will not grow in 
the sup Q strains used to titer the surface expression 
library. A 5 mi sample containing 5 x 10 13 Ml3mp8 phage, 
purified as described above, is placed in a small petri 
10 plate and irradiated with a germicidal lamp at a distance 
of two feet for 7 minutes (flux 150 /xw/cm 2 ) . NaN 3 is added 
to 0.02% and phage particles concentrated to 10 14 
particles/ml on a Centricon 30-kDa ultrafilter (Amicon) . 

For panning, polystyrene petri plates (60 x 15 mm) are 
15 incubated with 1 ml of 1 mg/ml of streptavidin (BRL) in 0.1 
M NaKC0 3 pH 8.6-0.02% NaN 3 in a small, air-tight plastic box 
overnight in a cold room. The next day streptavidin is 
removed and replaced with at least 10 ml blocking solution 
(29 mg/ml of BSA; 3 iiq/isl of streptavidin; 0.1 M NaHco 3 pH 
20 8.6-0.02% NaN 3 ) and incubated at least 1 hour at room 
temperature. The blocking solution is removed and plates 
are washed rapidly three times with Tris buffered saline 
containing 0.5% Tween 20 (TBS-0.5% Tween 20). 

Selection of phage expressing antibody fragments which 
25 bind BDP is performed with 5 /il (2.7 fig BDP) of blocked 
biotinylated BDP reacted with a 50 n± portion of the 
library. Each mixture is incubated overnight at 4"C, 
diluted with 1 ml TBS-0.5% Tween 20, and transferred to a 
streptavidin-coated petri plate prepared as described 
3 0 above. After rocking 10 minutes at room temperature, 
unbound phage are removed and plates washed ten times with 
TBS-0.5% Tween 20 over a period of 30-90 minutes. Bound 
phage are eluted from plates with 800 Mi sterile eiution 
buffer (1 mg/ml BSA, 0.1 M HC1, pH adjusted to 2.2 with 
3 5 glycerol) for 15 minutes and eluates neutralized with 48 fil 
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2 M Tris (pH unadjusted) . A 20 jil portion of each eluate 
is titered on MK30-3 concentrated cells with dilutions of 
input phage. 

A second round of panning is performed by treating 750 
5 ul of first eluate from the library with 5 mM DTT for 10 
minutes to break disulfide bonds linking biotin groups to 
residual biotinylated binding proteins. The treated eluate 
is concentrated on a Centricon 30 ultrafilter (Amicon) , 
washed three times with TBS-0.5% Tween 20, and concentrated 

10 to a final volume of about 50 pi. Final retentate is 
transferred to a tube containing 5.0 ul (2.7 ug BDP) 
blocked biotinylated BDP and incubated overnight. The 
solution is diluted with 1 ml TBS-0.5% Tween 20, panned, 
and eluted as described above on fresh streptavidin-coated 

15 petri plates. The entire second eluate (800 ul) is 
neutralized with 48 ul 2 M Tris, and 20 ul is titered 
simultaneously with the first eluate and dilutions of the 
input phage. If necessary, further rounds of panning can 
be performed to obtain homogeneous populations of phage. 

20 Additionally, phage can be plaque purified if reagents are 
available for detection. 

Template Prepa ration and Sequencing 

Templates are prepared for sequencing by inoculating 
a l ml culture of 2XYT containing a 1:100 dilution of an 

25 overnight culture of XL1 with an individual plaque from the 
purified population. The plaques are picked using a 
sterile toothpick. The culture is incubated at 37 "C for 5- 
6 hours with shaking and then transferred to a 1.5 ml 
microfuge tube. 200 ul of PEG solution is added, followed 

3 0 by vortexing and placed on ice for 10 minutes. The phage 
precipitate is recovered by centrifugation in a microfuge 
at 12,000 x g for 5 minutes. The supernatant is discarded 
and the pellet is resuspended in 230 ul of TE (10 mM Tris- 
HC1, pH 7.5, 1 mM EDTA) by gently pipeting with a yellow 
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pipet tip. Phenol (200 jxl) is added, followed by a brief 
vortex and microfuged to separate the phases. The aqueous 
phase is transferred to a separate tube and extracted with 
200 ill of phenol/chloroform (1:1) as described above for 
5 the phenol extraction. A 0.1 volume of 3 M NaOAc is added, 
followed by addition of 2.5 volumes of ethanol and 
precipated at -20 4 C for 20 minutes. The precipated 
templates are recovered by centrifugation in a microfuge at 
12,000 x g for 8 minutes. The pellet is washed in 70% 
10 ethanol, dried and resuspended in 25 fil TE. Sequencing was 
performed using a sequenase™ sequencing kit following the 
protocol supplied by the manufacturer (U.S. Biochemical, 
Cleveland, OH) . 

EXAMPLE II 

15 Cloning of Heavy and Light Chain Sequences 

Without Restriction Enzvme Digestion 

This example shows the simultaneous incorporation of 
antibody heavy and light chain fragment encoding sequences 
into a M13IXHL-type vector with the use of restriction 

2 0 endonuc leases. 

For the simultaneous incorporation of heavy and light 
chain encoding sequences into a single coexpression vector, 
a M13IXHL vector was produced that contained heavy and 
light chain encoding sequences for a mouse monoclonal 
25 antibody (DAN-18H4; Biosite, San Diego, CA) . The inserted 
antibody fragment sequences are used as complementary 
sequences for the hybridization and incorporation of He and 
Lc sequences by site-directed mutagenesis. The genes 
encoding the heavy and light chain polypeptides were 

3 0 inserted into M13IX30 (SEQ ID NO: 1) and M13IX11 (SEQ ID 

NO: 2). respectively, and combined into a single surface 
expression vector as described in Example I . The resultant 
M13IXHL-type vector is termed M13IX50. 
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The combinations were performed under conditions that 
facilitate the formation of one He and one Lc vector half 
into a single circularized vector. Briefly, the overhangs 
generated between the pairs of restriction sites after 
5 restriction with Mlu I or Hind III and exonuclease 
digestion are unequal (i.e.,- 64 nucleotides compared to 32 
nucleotides) . These unequal lengths result in differential 
hybridization temperatures for specific annealing of the 
complementary ends from each vector. The specific 

10 hybridization of each end of each vector half was 
accomplished by first annealing at 65 °C in a small volume 
(about 100 fxq/nl) to form a dimer of one He vector half and 
one Lc vector half. The aimers were circularized by 
diluting the mixture (to about 20 ng/fil) and lowering the 

15 temperature to about 25-37 °C to allow annealing. T4 ligase 
was present to covaientiy close the circular vectors. 

M13IX50 was modified such that it did not produce a 
functional polypeptide for the DAN monoclonal antibody. To 
do this , about eight amino acids were changed within the 
20 variable region of each chain by mutagenesis. The Lc 
variable region was mutagenized using the oligonucleotide 

5 1 -CTGAACCTGTCTGGGACCACAGTTGATGCTATAGGATCAGATCTAGAATTCATT 
TAGAGACTGGCCTGGCTTCTGC-3 ' (SEQ ID NO: 68). The He sequence 
was mutagenized with the oligonucleotide 5 ' - 

25 TCGACCGTTGGTAGGAATAATGCAATTAATG 
GAGTAGCTCTAAATTCAGAATTCATCTACACCCAGTGCATCCAGTAGCT-3 ' (SEQ 
ID NO: 69) . An additional mutation was also introduced 
into M13IX50 to yield the final form of the vector. During 
construction of an intermediate to M13IX50 (M13IX04 

3 0 described in Example I), a six nucleotide sequence was 
duplicated in oligonucleotide 027 and its complement 032. 
This sequence,- S'TTACCG-S' was deleted by mutagenesis using 
the oligonucleotide 5 ! - GGTAAACAGT AACGGT AAGAGTG CCAG - 3 ' (SEQ 
ID NO: 70). The resultant vector was designated M13IX53. 

35 M13IX53 can be produced as a single stranded form and 
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contains all the functional elements of the previously 
described M13IXHL vector except that it does not express 
functional antibody heteromers. The single-stranded vector 
can be hybridised to populations of single-stranded Kc and 
5 Lc encoding sequences for their incorporation into the 
vector by mutagenesis. Populations of single-stranded He 
and Lc encoding sequences can be produced by one skilled in 
the art from the PCR products described in Example I or by 
other methods known to one skilled in the art using the 
10 primers and teachings described therein. The resultant 
vectors with He and Lc encoding sequences randomly 
incorporated are propagated and screened for desired 
binding specificities as described in Example I. 

Other vectors similar to M13IX53 and the vectors it's 
15 derived from, M13IX11 and M13IX30, have also been produced 
for the incorporation of He and Lc encoding sequences 
without restriction. In contrast to M13IX53, these vectors 
contain human antibody sequences for the efficient 
hybridization and incorporation of populations of human He 
20 and Lc sequences. These vectors are briefly described 
below. The starting vectors were either the He vector 
(M13IX30) or the Lc vector (M13IX11) previously described. 

M13IX32 was generated from Ml 3 1X30 by removing the six 
nucleotide redundant sequence 5 ! -TTACCG-3 * described above 

25 and mutation of the leader sequence to increase secretion 
of the product. The oligonucleotide used to remove the 
redundant sequence is the same as that given above. The 
mutation in the leader sequence was generated using the 
oligonucleotide 5 • GGGCTTTTGCCACAGGGGT-3 1 . This mutagenesis 

30 resulted in the A residue at position 6353 of M13IX30 being 
changed to a G residue. 

A decapeptide tag for affinity purification of 
antibody fragments was incorporated in the proper reading 
frame at the carboxy-terminal end of the He expression site 
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in M13IX32 . The oligonucleotide used for this mutagenesis 
was 5 1 -CGCCTT CAGCCTAAGAAGCGTAGTCCGGAACGTCGTACGGGTAGGATCCA 
CTAG-3 ' (SEQ ID NO: 71). The resultant vector was 
designated M13IX33. Modifications to this or other vectors 
5 are envisioned which include various features known to one 
skilled in the art. For example, a peptidase cleavage site 
can be incorporated following the deeapeptide tag which 
allows the antibody to be cleaved from the gene VIII 
portion of the fusion protein, 

10 M13IX34 (SEQ ID NO: 3) was created from M13IX33 by 

cloning in the gene encoding a human IgGl heavy chain. The 
reading frame of the variable region was changed and a stop 
codon was introduced to ensure that a functional 
polypeptide would not be produced. The oligonucleotide 

15 used for the mutagenesis of the variable region was 5'- 
CACCGGTTCGGGGAATTAGTCTTGACCAGGCAGCCCAGGGC-3 » (SEQ ID NO: 

72) . The complete nucleotide sequence of this vector is 
shown in Figure 4 (SEQ ID NO: 3) . 

Several vectors of the M13IX11 series were also 
2 0 generated to contain similar modifications as that 
described for the vectors M13IX53 and M13IX34. The 
promoter region in M13IX11 was mutated to conform to the "35 
consensus sequence to generate M13IX12. The 
oligonucleotide used for this mutagenesis was 5'-ATTCCACAC 
25 ATTATACGAGCCGGAAGCATAAAGTGTCAAGCCTGGGGTGCC- 3 1 (SEQ ID NO: 

73) . A human kappa light chain sequence was cloned into 
M13IX12 and the variable region subsequently deleted to 
generate M13IX13 (SEQ ID NO: 4) . The complete nucleotide 
sequence of this vector is shown in Figure 5 (SEQ ID NO: 

30 4) . A similar vector, designated M13IX14, was also 
generated in which the human lambda light chain was 
inserted into M13IX12 followed by deletion of the variable 
region. The oligonucleotides used for the variable region 
deletion of Ml 3 1X13 and M13IX14 were 5 1 -CTG 

35 CTCATCAGATGGCGGGAAGAGCTCGGCCATGGCTGGTTG-3 » (SEQ ID NO: 74) 
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and 5 ' -GAACAGAGT GACCGAGGGGGCGAGCTCGGCCATGGCTGGTTG-3 1 (SEQ 
ID NO: 75) , respectively. 



The Kc and Lc vectors or modified forms thereof can be 
combined using the methods described in Example I to 
5 produce a single vector similar to M13IX53 that allows the 
efficient incorporation of human Kc and Lc encoding 
sequences by mutagenesis. An example of such a vector is 
the combination of M13IX13 with M13IX34. The complete 
nucleotide sequence of this vector, M13IX60, is shown in 
10 Figure 6 (SEQ ID NO: 5) . 



Additional modifications to any of the previously 
described vectors can also be performed to generate vectors 
which allow the efficient incorporation and surface 
expression of He and Lc sequences. For example, to 

15 alleviate the use of uracil selection against wild-type 
template during mutagenesis procedures, the variable region 
locations within the vectors can be substituted by a set of 
palindromic restriction enzyme sites (i.e., two similar 
sites in opposite orientation) . The palindromic sites will 

20 loop out and hybridize together during the mutagenesis and 
thus form a double-stranded substrate for restriction 
endonuclease digestion. Cleavage of the site results in 
the destruction of the wild-type template. The variable 
region of the inserted He or Lc sequences will not be 

25 affected since they will be in single stranded form. 

Following the methods of Example I, single- stranded He 
or Lc populations can be produced by a variety of methods 
known to one skilled in the art. For example, the PGR 
primers described in Example I can be used in asymmetric 
30 PCR to generate such populations. Gelfand et al., "PCR 
Protocols: A Guide to Methods and Applications", Ed by 
M.A. Innis (1990) , which is incorporated herein by 
reference. Asymmetric PCR is a PCR method that 

differentially amplifies only a single strand of the double 
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stranded template. Such differential amplification is 
accomplished by decreasing the primer amount for the 
undesirable strand about 10-fold compared to that for the 
desirable strand. Alternatively, single-stranded 

5 populations can be produced from double-stranded PCR 
products generated as described in Example I except that 
the primer (s) used to generate the undesirable strand of 
the double-stranded products is first phosphorylated at its 
5' end with a kinase. The resultant products can then be 
10 treated with a 5 : to 3 s exonuelease, such as lambda 
exonuclease (BRL, Bethesda, MD) to digest away the unwanted 
strand. 

Single-stranded He and Lc populations generated by the 
methods described above or by others known to one skilled 

15 in the art are hybridized to complementary sequences 
encoded in the previously described vectors. The 
population of the sequences are subsequently incorporated 
into a double-stranded form of the vector by polymerase 
extension of the hybridized templates. Propagation and 

20 surface expression of the randomly combined He and Lc 
sequences are performed as described in Example I. 

Although the invention has been described with 
reference to the presently preferred embodiment, it should 
be understood that various modifications can be made 
25 without departing from the spirit of the invention. 
Accordingly, the invention is limited only by the claims. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: HUSE, WILLIAM D. 

(ii) TITLE OF INVENTION : SURFACE EXPRESSION LIBRARIES OF 
HETEROMERIC RECEPTORS 

(ill) NUMBER OF SEQUENCES: 75 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: PRETTY, SCHROEDER, BRUEGGEMANN & CLARK 

(B) STREET: 444 SO. FLOWER STREET , SUITE 200 

(C) CITY: LOS ANGELES 

(D) STATE: CALIFORNIA 

(E) COUNTRY: UNITED STATES 

(F) ZIP: 90071 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS/MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION : 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: CAMPBELL, CATHRYN A. 

B) REGISTRATION NUMBER: 31,815 

C) REFERENCE/DOCKET NUMBER: P31 SSS2 

(ix) TELECOMMUNICATION INFORMATION : 

(A) TELEPHONE: 619-535-9001 

(B) TELEFAX: 619-535-8949 



(2) INFORMATION FOR SEQ ID NO: I: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7445 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: circular 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 

ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 

TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 360 

TCTTTCGGGG TTCGTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 
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CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAC-GA 480 

TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 

AAAGATTTTA GTATTACCCG GTCTGGCAAA ACTTCTTTTG GAAAAGGGTG TGGGTATTTT 600 

GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 660 

AATTGCTTTT GGCGTTATGT ATGTGGATTA GTTGAATGTG GTATTGGTAA ATGTGAAGTG 720 

ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 

TCTTGCGAAC GTGGTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 

GAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 

GTCGTGAGGG CAAGGGTTAT TGAGTGAATG AGGAGGTTTG TTAGGTTGAT TTGGGTAATG 960 

AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 

TGTAGAGCGT TGATCTGTCC TGTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 

GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 

CAGGGGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 

CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 

GTGGGATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 

CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 

CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 1440 

TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 

ATTCACCl^G AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1560 

TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 

TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 

TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGGTAACTA TGAGGGTTGT 1740 

CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 

TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGGGGT 1860 

TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 

ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTX ATCCGCCTGG TACTGAGCAA 1980 

AAGCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 2040 

CAGAATAATA GGTTGCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 

CAAGGCACTG ACCCGGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 

TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 

GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 

GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGGGGCTCTG AGGGTGGTGG CTCTGAGGGT 2340 

GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 2400 

GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 
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GAAAACGCGC 


TACAGTCTGA 


CGCTAAAGGC 


AAACTTGATT 


CTGTCGGTAC 


TGATTACGGT 


2520 


GCTGCTATCG 


ATGGTTTCAT 


tggtgacgtt 


TCGGGGCTTG 


CTAATGGTAA 


TGGTGCTACT 


2580 


GGTGATTTTG 


CTGGCTCTAA 


TTCCCAAATG 


GCTCAAGTCG 


GTGACGGTGA 


TAATTCACCT 


2640 


TTAATGAATA 


ATTTCCGTCA 


ATATTTACCT 


TCCCTCCCTC 


AATCGGTTGA 


ATGTCGCCCT 


2700 


TTTGTCTTTA 


GCGCTGGTAA 


ACCATATGAA 


TTTTCTATTG 


ATTGTGACAA 


AATAAACTTA 


2760 


TTCCGTGGTG 


TCTTTGCGTT 


TCTTTTATAT 


GTTGCCACCT 


TTATGTATGT 


AT'TTTCTACG 


2820 


TTTGCTAACA 


TAGTGCGTAA 


TAAGGAGTCT 


TAATCATGCC 


AGTTCTTTTG 


GGTATTCCGT 


2880 


TATTATTGCG 


TTTGGTGGGT 


TTCCTTCTGG 


TAACTTTGTT 


CGGCTATCTG 


CTTACTTTTC 


2940 


TTAAAAAGGG 


CTTCGGTAAG 


ATAGCTATTG 


CTATTTCATT 


GTTTCTTGCT 


CTTATTATTG 


3000 


GGCTTAACTC 


AATTCTTGTG 


GGTTATCTCT 


CTGATATTAG 


CGCTCAATTA 


CCCTCTGACT 


3060 


TTGTTCAGGG 


TGTTCAGTTA 


ATTCTCCCGT 


CTAATGCGCT 


TCCCTGTTTT 


TATGTTATTC 


3120 


TCTGTGTAAA 


GGGTGCTATT 


TTCATTTTTG 


ACGTTAAACA 


AAAAATCGTT 


TCTTATTTGG 


3180 


ATTGGGATAA 


ATAATATGGC 


TGTTTATTTT 


GTAACTGGCA 


AATTAGGCTC 


TGGAAAGACG 


3240 


CTCGTTAGCG 


TTGGTAAGAT 


TCAGGATAAA 


ATTGTAGCTG 


GGTGCAAAAT 


AGCAACTAAT 


3300 


CTTGATTTAA 


GGGTTCAAAA 


CCTCCCGCAA 


GTCGGGAGGT 


TCGCTAAAAC 


GCCTCGCGTT 


3360 


CTTAGAATAC 


CGGATAAGCC 


TTGTATATCT 


GATTTGCTTG 


CTATTGGGCG 


CGGTAATGAT 


3420 


TCCTACGATG 


AAAATAAAAA 


CGGCTTGCTT 


GTTCTCGATG 


AGTGGGGTAC 


TTGGTTTAAT 


3480 


AGCGGTTCTT 


GGAATGATAA 


GGAAAGAGAG 


GGGATTATTG 


ATTGGTTTGT 


AGATGCTGGT 


3540 


AAATTAGGAT 


GGGATATTAT 


TTTTCTTGTT 


CAGGACTTAT 


CTATTGTTGA 


TAAACAGGCG 


3600 


CGTTCTGCAT 


TAGCTGAACA 


TGTTGTTTAT 


TGTCGTCGTC 


TGGAGAGAAT 


TAGTTTAGCT 


3660 


ITTGTCGGTA 


CTTTATATTC 


TCTTATTACT 


GGCTCGAAAA 


TGCCTCTGCC 


TAAATTACAT 


3720 


GTTGGCGTTG 


TTAAATATGG 


CGATTCTCAA 


TTAAGCCCTA 


CTGTTGAGCG 


TTGGGTTTAT 


3780 


ACTGGTAAGA 


ATTTGTATAA 


CGCATATGAT 


ACTAAACAGG 


CTTTTTCTAG 


TAATTATGAT 


3840 


TCCGGTGTTT 


ATTCTTATTT 


AACGCCTTAT 


TTATCACACG 


GTCGGTATTT 


CAAACCATTA 


3900 


AATTTAGGTC 


AGAAGATGAA 


GCXTACIAAA 


ATATATTTGA 


AAAAGTTTTC 


ACGCGTTCTT 


3960 


TGTCTTGCGA 


TTGGATTTGC 


ATCAGCATTT 


ACATATAGTT 


ATATAACCCA 


ACCTAAGCCG 


4020 


GAGGTTAAAA 


AGGTAGTCTC 


TCAGACCTAT 


GATTTTGATA 


AATTCACTAT 


TGACTCTTCT 


4080 


CAGCGTCTTA 


ATCTAAGCTA 


TCGCTATGTT 


TTCAAGGATT 


CTAAGGGAAA 


ATTAATTAAT 


4140 


AGCGACGATT 


TACAGAAGCA 


AGGTTATICA 


CTCACATATA 


TTGATTTATG 


TACTGTTTCC 


4200 


ATTAAAAAAG 


GTAATTCAAA 


TGAAATTGTT 


AAATGTAATT 


AATTTTGTTT 


TCTTGATGTT 


4260 


TGTTTGATCA 


TCTTCTTTTG 


CTCAGGTAAT 


TGAAATGAAT 


AATTCGCCTC 


TGCGCGATTT 


4320 


TGTAACTTGG 


TATTCAAAGC 


AATCAGGCGA 


ATCCGTTATT 


GTTTCTCCCG 


ATGTAAAAGG 


4380 


TACTGTTACT 


GTATATTCAT 


CTGACGTTAA 


ACCTGAAAAT 


CTACGCAATT 


TCTTTATTIC 


4440 


TGTTTTACGT 


GCTAATAATT 


TTGATATGGT 


TGGTTCAATT 


CCTTCCATAA 


TTCAGAAGTA 


4500 
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TAATCGAAAC AATCAGGATT ATATTGATGA ATTGGGATGA TGTGATAATG AGGAATATGA 4560 

TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 4620 

TTTTAAAATT AATAAGGTTC GGGCAAAGGA TTTAATAGGA GTTGTGGAAT TGTTTGTAAA 4680 

GTCTAATACT TCTAAATCCT CAAATGTATT ATCTATTGAC GGCTCTAATC TATTAGTTGT 4740 

TAGTGCACCT AAAGATATTT TAGATAAGGT TGGTGAATTG GTTTGTAGTG TTGATTTGCC 4800 

AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 4860 

TTTTTCATTT GCTGCTGCGT GTGAGGGTGG GAGTGTTGGA GGGGGTGTTA ATACTGACCG 4920 

CCTCACCTCT GTTTTATCTT CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGTTTT 4980 

AGGGGTATGA GTTGGGGGAT TAAAGAGTAA TAGGCATTCA AAAATATTGT CTGTGCCACG 5040 

TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAGAATG TCGCTTTTAT 5100 

TAGTGGTGGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCG 5160 

TCAAAATGTA GGTATTTCCA TGAGCGTTTT TCCTGTTGCA ATGGCTGGCG GTAATATTGT 5220 

TGTGGATATT ACCAGCAAGG CCGATAGTTT GAGTTCTTCT ACTCAGGCAA GTGATGTTAT 5280 

TACTAATCAA AGAAGTATTG CTACAACGGT TAATTTGCGT GATGGACAGA CTCTTTTACT 5340 

CGGTGGCCTC ACTGATTATA AAAAGACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 5400 

AATCCCTTTA ATCGGCCTCC TGTTTAGCTC CCGCTCTGAT TCCAACGAGG AAAGCACGTT 5460 

ATAGGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA AGCGCGGCGG 5520 

GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT 5580 

TCGCTTTCTT CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC 5640 

GGGGGCTCCC TTTAGGGTTC CGATTTAGTG CTTTACGGCA CCTCGACCCC AAAAAACTTG 5700 

ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 5760 

CGTTGGAGTC CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA ACACTCAACC 5820 

CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC GATTTCGGAA CCACCATCAA 5880 

ACAGGATTTT GGCCTGCTGG GGCAAACCAG CGTGGACCGC TTGCTGCAAC TGTGTGAGGG 5940 

CCAGGCGGTG AAGGGCAATC AGCTGTTGCC CGTCTCGCTG GTGAAAAGAA AAACCACCCT 6000 

GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTGATTAA TGGAGGTGGG 6060 

ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 6120 

TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTGGTATG TTGTGTGGAA 6180 

TTGTGAGCGG ATAACAATTT CACACGCGTC ACTTGGCACT GGCCGTCGTT TTACAACGTC 6240 

GTGACTGGGA AAACCCTGGC GTTACCCAAG CTTTGTACAT GGAGAAAATA AAGTGAAACA 6300 

AAGCACTATT GCACTGGCAC TCTTACCGTT ACCGTTACTG TTTACCGCTG TGAGAAAAGC 6360 

CGCCCAGGTC CAGCTGCTCG AGTCAGGCCT ATTGTGCGCA GGGGATTGTA CTAGTGGATC 6420 

CTAGGCTGAA GGCGATGACC CTGCTAAGGC TGCATTCAAT AGTTTACAGG CAAGTGCTAC 6480 

TGAGTACATT GGCTAGGCTT GGGCTATGGT AGTAGTTATA GTTGGTGGTA GGATAGGGAT 6540 
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TAAATTATTC 


AAAAAGTTTA 


CGAGCAAGGC 


TTCTTAAGCA ATAGGGAAGA GGGGGGGACG 


6600 


GATCGCCCTT 


CCCAACAGTT 


GCGCAGCGTG 


AATGGCGAAT GGCGCTTTGC CTGGTTTCCG 


6660 


GCACCAGAAG 


CGGTGCCGGA 


AAGCTGGCTG 


GAGTGCGATC TTCCTGAGGC CGATACGGTC 


6720 


GTCGTCCCCT 


CAAACTGGCA 


GATGCACGGT 


TACGATGCGC CCATCTACAC CAACGTAACC 


6780 


TATCCCATTA 


CGGTCAATCG 


GCCGTTTGTT 


CCCACGGAGA ATCCGACGGG TTGTTACTCG 


6840 


CTCACATTTA 


ATGTTGATGA 


AAGCTGGCTA 


CAGGAAGGCC AGACGCGAAT TATTTTTGAT 


6900 


GGCGTTCCTA 


TTGGTTAAAA 


AATGAGCTGA 


TTTAACAAAA ATTTAACGCG AATTTTAACA 


6960 


AAATATTAAC 


GTTTACAATT 


TAAATATTTG 


CTTATACAAT CTTCCTGTTT TTGGGGCTTT 


7020 


TCTGATTATC 


AACCGGGGTA 


CATATGATTG 


ACATGCTAGT TTTACGATTA CCGTTCATCG 


7080 


ATTCTCTTGT 


TTGCTGCAGA 


CTCTCAGGCA 


ATGACCTGAT AGCCTTTGTA GATCTCTCAA 


7140 


AAATAGCTAC 


CCTCTCCGGC 


ATTAATTTAT 


CAGCTAGAAC GGTTGAATAT CATATTGATG 


7200 


GTGATTTGAC 


TGTCTCCGGG 


GTTTCTCACG 


CTTTTGAATC TTTACCTACA CATTACTCAG 


7260 


GCATTGCATT 


TAAAATATAT 


GAGGGTTCTA 


AAAATTTTTA TCCTTGCGTT GAAATAAAGG 


7320 


CTTCTCCCGC 


AAAAGTATTA 


GAGGGTGATA 


ATGTTTTTGG TACAACCGAT TTAGCTTTAT 


7380 


GCTCTGAGGC 


TTTATTGCTT 


AATTTTGCTA 


ATTCTTTGCC TTGCCTGTAT GATTTATTGG 


7440 


ACGTT 








7445 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 7317 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 

ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGGAATTAAG CTCTAAGCCA 240 

TCCGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 

TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 360 

TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 

CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 480 

TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTGCGCAG TATTGGACGC TATCCAGTCT 540 

AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 

GGTTTTTATC GTGGTGTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 660 

AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 
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ATGAATCTTT GTAGGTGTAA TAATGTTGTT CGGTTAGTTC GTTTTATTAA CGTAGATTTT 780 

TCTTCCCAAC GTCCTGACTG GTATAATGAG CGAGTTCTTA AAATCGCATA AGGTAATTCA 840 

GAATGATTAA AGTTGAAATT AAAGGATGTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 

GTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 

AATATCCGGT TCTTGTGAAG ATTAGTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 

TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGT'TCCCTT ATGATTGACC 1080 

GTGTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 

CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 

CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 

GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 

CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 

CGATCCCGCA AAAGCGGCGT TTAACTCCCT GGAAGCCTCA GCGACCGAAT ATATCGGTTA 1440 

TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 

ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1560 

TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 

TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCGCATAC AGAAAATTCA 1580 

TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 1740 

CTGTGGAATG CTACAGGCGT TGTAGTTTGT AGTGGTGACG AAACTCAGTG TTAGGGTACA 1800 

TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 1860 

TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC GTGAGTAGGG TGATAGAGGT 1920 

ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 

AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTGTGAGG GTGTTAATAG TTTGATGTTT 2040 

CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 

GAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATG AAAAGGGATG 2160 

TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 

GATCCATTCG TTTGTGAATA TCAAGGCGAA TGGTGTGACG TGGGTGAAGG TGGTGTGAAT 2280 

GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGGGGCTCTG AGGGTGGTGG CTCTGAGGGT 2340 

GGCGGTTCTG AGGGTGGGGG GTGTGAGGGA GGGGGTTGGG GTGGTGGGTG TGGTTCCGGT 2400 

GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 

GAAAACGCGC TAGAGTGTGA GGGTAAAGGG AAAGTTGATT GTGTGGGTAC TGATTACGGT 2520 

GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 

GGTGATTTTG CTGGGTGTAA TTCCCAAATG GGTGAAGTGG GTGACGGTGA TAATTGACCT 2640 

TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 2700 

TTTGTCTTTA GCGCTGGTAA AGGATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 2760 
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TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 2820 

TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATG'CC AGTTCTTTTG GGTATTCCGT 2880 

TATTATTGCG TTTGCTCGGT TTCCTTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC 2940 

TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG 3000 

GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3060 

TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT GTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 

TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG 3180 

ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGGA AATTAGGCTC TGGAAAGACG 3240 

GTCGTTAGCG TTGGTAAGAT TCAGGATAAA. ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 

CTTGATTTAA GGGTTGAAAA GGTGGGGCAA GTGGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360 

CTTAGAATAC GGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 3420 

TCCTACGATG AAAATAAAAA GGGGTTGCTT GTTGTCGATG AGTGCGGTAC TTGGTTTAAT 3480 

ACCCGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 3540 

AAATTAGGAT GGGATATTAT TTTTCTTGTT GAGGAGTTAT CTATTGTTGA TAAACAGGCG 3600 

CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 

TTTGTCGGTA CTTTATATTC TCTTATTACT GGGTGGAAAA TGCCTCTGCC TAAATTACAT 3720 

GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 

ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG GTTTTTGTAG TAATTATGAT 3840 

TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 

AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC AGGGGTTGTT 3960 

TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 4020 

GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGAGTGTTGT 4080 

CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 4140 

AGCGACGATT TACAGAAGCA AGGTTATTCA CTGACATATA TTGATTTATG TACTGTTTCC 4200 

ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTTT TCTTGATGTT 4260 

TGTTTCATGA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCCTC TGCGCGATTT 4320 

TGTAACTTGG TATTCAAAGC AATGAGGCGA ATCCGTTATT GTTTCTCCCG ATGTAAAAGG 4380 

TACTGTTACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTC 4440 

TGTTTTACGT GGTAATAATT TTGATATGGT TGGTTCAATT CCTTCCATAA TTCAGAAGTA 4500 

TAATCCAAAC AATCAGGATT ATATTGATGA ATTGCCATCA TCTGATAATC AGGAATATGA 4560 

TGATAATTGC GGTCCTTCTG GTGGTTTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 4620 

TTTTAAAATT AATAACGTTC GGGCAAAGGA TTTAATACGA GTTGTCGAAT TGTTTGTAAA 4680 

GTGTAATACT TCTAAATCCT GAAATGTATT ATGTATTGAC GGCTCTAATC TATTAGTTGT 4740 

TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 4800 
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AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGGTTTAGA 4860 

TTTTTCATTT GCTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG 4920 

CCTCACCTCT GTTTTATGTT GTGGTGGTGG TTGGTTGGGT ATTTTTAATG GGGATGTTTT 4980 

AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCCACG 5040 

TATTCTTACG GTTTGAGGTC AGAAGGGTTG TATGTGTGTT GGCCAGAATG TCCCTTTTAT 5100 

TACTGGTCGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCG 5160 

TCAAAATGTA GGTATTTGCA TGAGCGTTTT TGGTGTTGCA ATGGCTGGCG GTAATATTGT 5220 

TCTGGATATT ACCAGCAAGG CCGATAGTTT GAGTTCTTCT ACTCAGGCAA GTGATGTTAT 5280 

TAGTAATGAA AGAAGTATTG GTAGAACGGT TAATTTGCGT GATGGACAGA CTCTTTTACT 5340 

CGGTGGCCTC ACTGATTATA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 5400 

AATGGGTTTA ATCGGCCTCC TGTTTAGCTC CCGCTCTGAT TCCAACGAGG AAAGCACGTT 5460 

ATACGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA AGCGCGGCGG 5520 

GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT 5580 

TCGCTTTCTT CCCTTCCTTT CTCGCCACGT TGGCCGGCtT TCCCCGTCAA GCTCTAAATC 5640 

GGGGGCTCCC TTTAGGGTTC CGATTTAGTG CTTTACGGCA CCTCGACCCC AAAAAACTTG 5700 

ATTTGGGTGA TGGTTGAGGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 5760 

CGTTGGAGTC CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA ACACTCAACC 5820 

CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC GATTTCGGAA CCACCATCAA 5880 

ACAGGATTTT CGCCTGCTGG GGCAAACCAG CGTGGACCGC TTGCTGCAAC TCTCTCAGGG 5940 

CGAGGCGGTG AAGGGCAATC AGCTGTTGCC CGTCTCGCTG GTGAAAAGAA AAACCACCCT 6000 

GGCGCCGAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 6060 

ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 6120 

TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 6180 

TTGTGAGCGG ATAACAATTT CACACGCCAA GGAGACAGTC ATAATGAAAT ACCTATTGGC 6240 

TACGGCAGCC GCTGGATTGT TATTACTCGC TGCCCAACCA GCCATGGCCG AGCTCGTGAT 6300 

GACCCAGACT CCAGATATCC AACAGGAATG AGTGTTAATT CTAGAACGCG TGAGTTGGGA 6360 

CTGGCCGTCG TTTTACAACG TCGTGACTGG GAAAACCCTG GCGTTACCCA AGCTTAATCG 6420 

CCTTGGAGAA TTGCCTTTCG CCAGCTGGCG TAATAGCGAA GAGGCCCGCA CCGATGGGCC 6480 

TTCCCAACAG TTGCGCAGCC TGAATGGCGA ATGGCGCTTT GCCTGGTTTC CGGCACCAGA 6540 

AGCGGTGCCG GAAAGCTGGC TGGAGTGCGA TCTTCCTGAG GCCGATAGGG TGGTGGTGGG 6600 

CTCAAACTGG CAGATGCACG GTTACGATGC GCCCATCTAC ACCAACGTAA CCTATCCCAT 6660 

TACGGTCAAT GGGGCGTTTG TTCCCACGGA GAATCCGACG GGTTGTTAGT CGCTCACATT 6720 

TAATGTTGAT GAAAGCTGGC TACAGGAAGG CCAGACGCGA ATTATTTTTG ATGGCGTTCC 6780 

TATTGGTTAA AAAATGAGCT GATTTAACAA AAATTTAACG CGAATTTTAA GAAAATATTA 6840 
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ACGTTTACAA TTTAAATATT TGCTTATACA ATCTTCCTGT TTTTGGGGCT TTTCTGATTA 6900 

TCAACCGGGG TAGATATGAT TGACATGCTA GTTTTAGGAT TACCGTTCAT CGATTCTGTT 6960 

GTTTGCTCCA GACTCTCAGG CAATGACCTG ATAGCCTTTG TAGATCTCTC AAAAATAGCT 7020 

ACCCTCTCCG GGATTAATTT ATCAGCTAGA ACGGTTGAAT ATCATATTGA TGGTGATTTG 7080 

ACTGTCTCCG GCCTTTCTCA CCCTTTTGAA TCTTTACCTA CACATTACTC AGGCATTGCA 7140 

TTTAAAATAT ATGAGGGTTC TAAAAATTTT TATCCTTGCG TTGAAATAAA GGCTTCTCGC 7200 

GCAAAAGTAT TACAGGGTCA TAATGTTTTT GGTACAACCG ATTTAGCTTT ATGCTCTGAG 7260 

GCTTTATTGC TTAATTTTGC TAATTCTTTG CCTTGCCTGT ATGATTTATT GGATGTT 7317 
(2) INFORMATION FOR SEQ ID NO:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7729 base Dairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 

ATAGCTAAAC AGGTTATTGA CGATTTGGGA AATGTATCTA ATGGTGAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAAACATGT TGAGGTACAG GAG GAG ATT G AGCAATTAAG CTGTAAGGGA 240 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 

TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 360 

TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 

CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 480 

TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 

AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 

GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGIG TTGCTCTTAC TATGCCTCGT 660 

AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 

ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 

TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 

CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCGAATT TACTACTCGT TCTGGTGTTT 900 

CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 

AATATGCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GGCAGCCTAT GCGCCTGGTC 1020 

TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 

GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 

CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGGGC TTGGTATAAT CGCTGGGGGT 1200 
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GAAAGATGAG TCTTTTAGTG TATTGTTTGG GGTGTTTGGT TTTAGGTTGG TGCCTTCGTA 1260 

GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTCCTC ATGAAAAAGT CTTTAGTCCT 1320 

CAAAGCCTGT GTAGGCGTTG GTAGCGTGGT TGC-^TGCTG TGTTTCGCTG CTGAGGGTGA 1380 

CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 1440 

TGGGTGGGGG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 

ATTCACCTCG AAAGGAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1560 

TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 

TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 

TTTACTAAGG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 1740 

CTGTGGAATG CTACAGGCGT T6TAGTTT6T AGTGGTGACG AAACTCAGTG TTACGGTACA 1800 

TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 1860 

TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 

ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 

AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 2040 

CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 

CAAGGCACTG ACCCCGTTAfl. AACTTATTAC GAGTACACTC CTGTATCATC AAAAGCCATG 2160 

TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 

GATCCATTCG TTTGTGAATA TCAAGGGGAA TGGTGTGAGC TGCGTCAACC TCCTGTGAAT 2280 

GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 2340 

GGCGGTTGTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 2400 

GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 

GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 2520 

GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 

GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTGG GTGAGGGTGA TAATTGAGGT 2640 

TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 2700 

TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTGTATT'G ATTGTGAGAA AATAAACTTA 2760 

TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 2820 

TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGGG AGTTCTTTTG GGTATTGGGT 2880 

TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC 2940 

TTAAAAAGGG CTTCGGTAAG ATAGGTATTG GTATTTGATT GTTTGTTGGT GTTATTATTG 3000 

GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3060 

TTGTTCAGGG TGTTGAC-TTA ATTGTGGGGT GTAATGGGGT TGGGTGTTTT TATGTTATTC 3120 

TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG 3180 

ATTGGGATAA ATAATATGGG TGTTTATTTT GTAAGTGGGA AATTAGGGTG TGGAAAGACG 3240 
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CTCGTTAGCG TTGGTAAGAT TGAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 

CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360 

CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGGG CGGTAATGAT 3420 

TCCTACGATG AAAATAAAAA GGGGTTGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAAT 3480 

AGCCGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 3540 

AAATTAGGAT GGGATATTAT TTTTCTTGTT CAGGACTTAT GTATTGTTGA TAAACAGGCG 3600 

CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 

TTTGTGGGTA GTTTATATTC TCTTATTACT GGGTCGAAAA TGCCTCTGCC TAAATTACAT 3720 

GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 

AGTGGTAAGA ATTTGTATAA GGCATATGAT ACTAAACAGG CTTTTTGTAG TAATTATGAT 3840 

TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 

AATTTAGGTG AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 3960 

TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 4020 

GAGGTTAAAA AGGTAGTGTG TGAGAGGTAT GATTTTGATA AATTGACTAT TGACTCTTCT 4080 

CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 4140 

AGCGACGATT TACAGAAGGA AGGTTATTGA GTGAGATATA TTGATTTATG TAGTGTTTCC 4200 

ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTTT TCTTGATGTT 4260 

TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTGGGGTG TGGGGGATTT 4320 

TGTAACTTGG TATTCAAAGC AATCAGGCGA ATCCGTTATT GTTTCTCCCG ATGTAAAAGG 4380 

TAGTGTTACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTG 4440 

TGTTTTACGT GCTAATAATT TTGATATGGT TGGTTCAATT CCTTCCATAA TTCAGAAGTA 4500 

TAATCCAAAC AATGAGGATT ATATTGATGA ATTGCCATCA TCTGATAATC AGGAATATGA 4560 

TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 4620 

TTTTAAAATT AATAACGTTC GGGCAAAGGA TTTAATACGA GTTGTCGAAT TGTTTGTAAA 4680 

GTGTAATACT TGTAAATCCT CAAATGTATT ATCTATTGAC GGCTCTAATC TATTAGTTGT 4740 

TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 4800 

AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 4860 

TTTTTCATTT GCTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG 4920 

CCTCACGTCT GTTTTATCTT CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGTTTT 4980 

AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCCACG 5040 

TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCGAGAATG TCCCTTTTAT 5100 

TACTGGTCGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCG 5160 

TCAAAATGTA GGTATTTCCA TGAGGGTTTT TCCTGTTGGA ATGGCTGGCG GTAATATTGT 5220 

TCTGGATATT ACCAGCAAGG GCGATAGTTT GAGTTGTTCT ACTCAGGCAA GTGATGTTAT 5280 
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TACTAATCAA AGAAGTATTG GTAGAAGGGT TAATTTGGGT GATGGAGAGA GTGTTTTACT 5340 

CGGTGGCCTC ACTGATTATA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 5400 

AATCGCTTTA ATCGGCCTCG TGTTTAGGTG GGGGTGTGAT TGGAAGGAGG AAAGCAGGTT 5460 

ATAGGTGCTC GTCAAAGCAA CGATAGTACG CGCCCTGTAG CGGCGCATTA AGCGCGGCGG 5520 

GTGTGGTGGT TAGGGGGAGG GTGAGGGGTA CACTTGCCAG CGCCCTAGCG GGGGGTCGTT 5580 

TCGCTTTCTT CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC 5640 

GGGGGGTGGG TTTAGGGTTG GGATTTAGTG GTTTACGGCA CCTCGACCCC AAAAAACTTG 5700 

ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 5760 

CGTTGGAGTG GAGGTTGTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA ACACTCAACC 5820 

CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC GATTTCGGAA CCACCATCAA 5880 

ACAGGATTTT GGGGTGGTGG GGCAAACCAG CGTGGACCGC TTGCTGCAAC TCTCTCAGGG 5940 

CCAGGCGGTG AAGGGCAATC AGCTGTTGCC CGTCTCGCTG GTGAAAAGAA AAACCACCCT 6000 

GGGGGGGAAT AGGGAAACCG CGTGTGCCGG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 6060 

ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA GGCAATTAAT GTGAGTTAGC 6120 

TCACTCATTA GGCAGCCCAG GCTTTAGACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 6180 

TTGTGAGCGG ATAACAATTT CAGACGCGTC ACTTGGCACT GGGGGTCGTT TTACAACGTC 6240 

GTGACTGGGA AAACCCTGGC GTTAGCCAAG CTTTGTACAT GGAGAAAATA AAGTGAAACA 6300 

AAGCACTATT GCAGTGGCAC TCTTACGGTT ACTGTTTACG CCTGTGGCAA AAGCCCAGGT 6360 

GCAGCTGCTC GAGTCGGTCT TCCCCCTGGC ACGCTCCTCC AAGAGCACCT CTGGGGGCAC 6420 

AGCGGCCCTG GGCTGCCTGG TCAAGACTAA TTGCCCGAAC CGGTGACGGT GTCGTGGAAC 6480 

TCAGGCGCCC TGACCAGCGG CGTGCACACC TTCCCGGCTG TCCTACAGTC CTCAGGACTC 6540 

TACTCCCTCA GCAGCGTGGT GACCGTGCCC TCCAGCAGCT TGGGCACCCA GACCTACATC 5600 

TGCAACGTGA ATCACAAGCC CAGCAACACC AAGGTGGACA AGAAAGCAGA GCCCAAATCT 6660 

TGTACTAGTG GATCCTACCC GTACGACGTT CG ^CTACG CTTCTTAGGC TGAAGGCGAT 6720 

GACCCTGCTA AGGCTGCATT CAATAGTTTA CA&JCAAGTG CTACTGAGTA CATTGGCTAC 6780 

GCTTGGGCTA TGGTAGTAGT TATACTTCCT GCTACCATAG GGATTAAATT ATTCAAAAAG 6840 

TTTACGAGCA AGGCTTCTTA AGCAATAGCG AAGAGGCCCG CACCGATCGC CCTTCCCAAC 6900 

AGTTGCGGAG CCTGAATGGC GAATGGCGCT TTGCCTGGTT TCCGGCACGA GAAGGGGTGG 6960 

CGGAAAGCTG GCTGGAGTGC GATCTTCCTG AGGCCGATAC GGTCGTCGTC CCCTCAAACT 7020 

GGCAGATGCA CGGTTAGGAT GCGCCCATCT ACACCAACGT AACCTATGCC ATTAGC-GTGA 7080 

ATCCGCCGTT TGTTCCCACG GAGAATCCGA CGGGTTGTTA CTCGCTCACA TTTAATGTXG 7140 

ATGAAAGCTG GGTACAGGAA GGCCAGACGG GAATTATTTT TGATGGGGTT GGTATTGGTT 7200 

AAAAAATGAG CTGATTTAAC AAAAATTTAA CGCGAATTTT AACAAAATAT TAACGTTTAC 7260 

AATTTAAATA TTTGGTTATA GAATGTTGGT GTTTTTGGGG GTTTTGTGAT TATCAACCGG 7320 
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GGTACATATG ATTGACATGC TAGTTTTACG ATTACCGTTC ATCGATTCTG TTGTTTGGTC 73S0 

CAGACTCTCA GGCAATGACC TGATAGGCTT TGTAGATCTC TCAAAAATAG CTACCCTCTC 7440 

CGGCATTAAT TTATCAGCTA GAACGGTTGA ATATCATATT GATGGTGATT TGACTGTGTC 7500 

CGGCCTTTCT CACCCTTTTG AATCTTTACC TACACATTAC TCAGGCATTG CATTTAAAAT 7560 

ATATGAGGGT TCTAAAAATT TTTATCCTTG CGTTGAAATA AAGGCTTCTC CCGCAAAAGT 7620 

ATTACAGGGT CATAATGTTT TTGGTACAAG CGATTTAGCT TTATGCTCTG AGGCTTTATT 7680 

GCTTAATTTT GCTAATTCTT TGCCTTGCCT GTATGATTTA TTGGACGTT 7729 
(2) INFORMATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7557 base pairs 

(B) TYPE: nucleic aeid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 



AATGCTACTA 


CTATTAGTAG 


AATTGATGCC 


ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 


60 


ATAGCTAAAC 


AGGTTATTGA 


CCATTTGCGA 


AATGTATCTA ATGGTCAAAC TAAATCTACT 


120 


GGTTGGGAGA 


ATTGGGAATC 


AACTGTTACA 


TGGAATGAAA CTTCCAGACA CCGTACTTTA 


180 


GTTGCATATT 


TAAAACATGT 


TGAGCTACAG 


CACCAGATTC AGCAATTAAG CTCTAAGCCA 


240 


TCCGCAAAAA 


TGAGCTCTTA 


TCAAAAGGAG 


GAATTAAAGG TACTCTCTAA TCCTGACGTG 


300 


TTGGAGTTTG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGCTCGAA TTAAAACGCG ATATTTGAAG 


360 


TCTTTCGGGC 


TTCCTCTTAA 


TCTTTTTGAT 


GCAATGCGGT TTGGTTCTGA CTATAATAGT 


420 


CAGGGTAAAG 


ACCTGATTTT 


TGATTTATGG 


TCATTCTCGT TTTCTGAACT GTTTAAAGCA 


480 


TTTGAGGGGG 


ATTCAATGAA 


TATTTATGAC 


GATTCGGGAG TATTGGACGC TATGCAGTGT 


540 


AAACATTTTA 


CTATTACCCC 


CTCTGGCAAA 


ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 


600 


GGTTTTTATC 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG TTGCTGTTAC TATGGGTGGT 


660 


AATTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG GTATTCCTAA ATCTCAACTG 


720 


ATGAATCTTT 


CTACCTGTAA 


TAATGTTGTT 


CCGTTAGTTC GTTTTATTAA CGTAGATTTT 


780 


TCTTCCCAAC 


GTCCTGACTG 


GTATAATGAG 


CCAGTTCTTA AAATCGCATA AGGTAATTCA 


840 


CAATGATTAA 


AGTTGAAATT 


AAACCATCTC 


AAGCCCAATT TACTACTCGT TCTGGTGTTT 


900 


CTCGTCAGGG 


CAAGCCTTAT 


TCACTGAATG 


AGCAGCTTTG TTACGTTGAT TTGGGTAATG 


960 


AATATCCGGT 


TCTTGTCAAG 


ATTACTCTTG 


ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 


1020 


TGTACACCGT 


TCATCTGTCC 


TCTTTCAAAG 


TTGGTCAGTT CGGTTCCCTT ATGATTGACC 


1080 


GTCTGGGCCT 


CGTTCCGGCT 


AAGTAACATG 


GAGCAGGTCG CGGATTTCGA CACAATTTAT 


1140 


CAGGCGATGA 


TACAAATCTC 


CGTTGTACTT 


TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 


1200 


CAAAGATGAG 


TGTTTTAGTG 


TATTCTTTCG 


CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 


1260 



WO 92/06204 



PCI7US9I/07149 



55 

GTGGCATTAC GTATTTTACG CGTTTAATGG AAAGTTGGTG ATGAAAAAGT CTTTAGTCCT 1320 

CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 

CGATCCCGCA AAAGCGGCCT TTAAGTGGGT GGAAGGGTGA GGGACCGAAT ATATCGGTTA 1440 

TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 

ATTCACCTCG AAAGCAAGCT GATAAAGGGA TAGAATTAAA GGGTGGTTTT GGAGCGTTTT 1560 

TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 

TATTGTGAGT GGGGTGAAAG TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 

TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 1740 

GTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 

TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 1860 

TGTGAGGGTG GGGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 

ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 

AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 2040 

CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 

CAAGGCAGTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 

TATGACGCTT AGTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 

GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 

GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTGTG AGGGTGGTGG CTCTGAGGGT 2340 

GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 2400 

GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 

GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 2520 

GCTGCTATGG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 

GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 2640 

TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTG AATCGGTTGA ATGTGGGGCT 2700 

TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 2760 

TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGGCACCT TTATGTATGT ATTTTGTACG 2820 

TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGGC AGTTCTTTTG GGTATTCCGT 2880 

TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT GGGGTATGTG GTTAGTTTTG 2940 

TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CCTGTTTCTT GCTCTTATTA TTGGGCTTAA 3000 

CTCAATTGTT GTGGGTTATC TCTCTGATAT TAGCGCTCAA TTAGGGTGTG AGTTTGTTGA 3060 

GGGTGTTCAG TTAATTCTCC CGTCTAATGC GCTTCCGTGT TTTTATGTTA TTCTCTCTGT 3120 

AAAGGCTGCT ATTTTGATTT TTGACGTTAA AGAAAAAATG GTTTCTTATT TGGATTGGGA 3180 

TAAATAATAT GGCTGTTTAT TTTGTAACTG GCAAATTAGG CTCTGGAAAG ACGCTCGTTA 3240 

GCGTTGGTAA GATTCAGGAT AAAATTGTAG CTGGGTGGAA AATAGCAACT AATGTTGATT 3300 
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TAAGGCTTCA AAACCTCCCG CAAGTCGGGA GGTTCGCTAA AACGCCTCGC GTTCTTAGAA 3360 

TACCGGATAA GCCTTCTATA TCTGATTTGC TTGCTATTGG GCGCGGTAAT GATTCCTACG 3420 

ATGAAAATAA AAACGGCTTG CTTGTTGTCG ATGAGTGCGG TACTTGGTTT AATACCGGTT 3480 

CTTGGAATGA TAAGGAAAGA CAGCCGATTA TTGATTGGTT TCTACATGCT CGTAAATTAG 3540 

GATGGGATAT TATTTTTCTT GTTCAGGAGT TATCTATTGT TGATAAACAG GCGCGTTCTG 3600 

CATTAGCTGA AGATGTTGTT TATTGTCGTC GTCTGGACAG AATTACTTTA CGTTTTGTGG 3660 

GTACTTTATA TTCTCTTATT ACTGGCTCGA AAATGCCTCT GCCTAAATTA CATGTTGGCG 3720 

TTGTTAAATA TGGGGATTCT GAATTAAGCC CTACTGTTGA GCGTTGGCTT TATACTGGTA 3780 

AGAATTTGTA TAACGCATAT GATACTAAAC AGGCTTTTTC TAGTAATTAT GATTCCGGTG 3840 

TTTATTCTTA TTTAACGCCT TATTTATCAC ACGGTCGGTA TTTCAAACCA TTAAATTTAG 3900 

GTGAGAAGAT GAAGCTTACT AAAATATATT TGAAAAAGTT TTCACGCGTT CTTTGTCTTG 3960 

GGATTGGATT TGGATCAGCA TTTACATATA GTTATATAAC CCAACCTAAG CCGGAGGTTA 4020 

AAAAGGTAGT CTCTCAGACC TATGATTTTG ATAAATTCAC TATTGACTCT TCTCAGCGTC 4080 

TTAATGTAAG CTATCGCTAT GTTTTGAAGG ATTCTAAGGG AAAATTAATT AATAGCGACG 4140 

ATTTACAGAA GGAAGGTTAT TCACTCACAT ATATTGATTT ATGTACTGTT TCCATTAAAA 4200 

AAGGTAATTG AAATGAAATT GTTAAATGTA ATTAATTTTG TTTTCTTGAT GTTTGTTTCA 4260 

TCATCTTCTT TTGCTCAGGT AATTGAAATG AATAATTCGC CTCTGCGCGA TTTTGTAACT 4320 

TGGTATTGAA' AGGAATGAGG GGAATCCGTT ATTGTTTGTC CCGATGTAAA AGGTACTGTT 4380 

ACTGTATATT CATCTGACGT TAAACCTGAA AATCTACGCA ATTTCTTTAT TTCTGTTTTA 4440 

CGTGCTAATA ATTTTGATAT GGTTGGTTCA ATTCGTTCCA TAATTGAGAA GTATAATGCA 4500 

AACAATCAGG ATTATATTGA TGAATTGCCA TCATCTGATA ATCAGGAATA TGATGATAAT 4560 

TCCGCTCCTT CTGGTGGTTT CTTTGTTCCG CAAAATGATA ATGTTAGTGA AAGTTTTAAA 4620 

ATTAATAACG TTCGGGCAAA GGATTTAATA GGAGTTGTCG AATTGTTTGT AAAGTCTAAT 4680 

ACTTCTAAAT CCTCAAATGT ATTATCTATT GAGGGGTGTA ATCTATTAGT TGTTAGTGGA 4740 

CCTAAAGATA TTTTAGATAA CCTTCCTCAA TTCCTTTCTA CTGTTGATTT GCCAACTGAC 4800 

CAGATATTGA TTGAGGGTTT GATATTTGAG GTTCAGCAAG GTGATGCTTT AGATTTTTCA 4860 

TTTGCTGCTG GCTCTCAGCG TGGCACTGTT GCAGGCGGTG TTAATACTGA CCGCCTCACC 4920 

TGTGTTTTAT CTTCTGCTGG TGGTTCGTTC GGTATTTTTA ATGGCGATGT TTTAGGGCTA 4980 

TCAGTTCGCG CATTAAAGAC TAATAGC'CAT TCAAAAATAT TGTCTGTGCC ACGTATTCTT 5040 

ACGCTTTCAG GTCAGAAGGG TTCTATCTCT GTTGGCCAGA ATGTCCCTTT TATTACTGGT 5100 

CGTGTGACTG GTGAATCTGC CAATGTAAAT AATCCATTTC AGACGATTGA GCGTCAAAAT 5160 

GTAGGTATTT CCATGAGCGT TTTTCCTGTT GCAATGGCTG GCGGTAATAT TGTTCTGGAT 5220 

ATTACCAGCA AGGCCGATAG TTxGAGTTCT TCTACTCAGG CAAGTGATGT TATTACTAAT 5280 

CAAAGAAGTA TTGCTACAAC GGTTAATTTG CGTGATGGAC AGACTCTTTT ACTCGGTGGC 5340 
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CTCACTGATT ATAAAAACAC TTCTCAAGA" TGTGGGGTAG GGTTGGTGTG TAAAATGGGT 5400 

TTAATCGGCC TCCTGTTTAG CTCCCGCTCT GATTCCAACG AGGAAAGCAC GTTATACGTG 5460 

GTCGTCAAAG CAAGGATAGT AGGGGGGGTG TAGGGGGGGA TTAAGCGCGG CGGGTGTGGT 5520 

GGTTACGCGC AGCGTGACCG CTACACTTGC CAGCGCCCTA GCGCCCGCTC CTTTCGCTTT 5580 

CTTCCCTTCC TTTGTGGGGA GGTTGGGGGG CTTTGGCCGT GAAGGTCTAA ATCGGGGGCT 5640 

CCCTTTAGGG TTCCGATTTA GTGCTTTACG GCACCTCGAC CCCAAAAAAC TTGATTTGGG 5700 

TGATGGTTGA CGTAGTGGGG GATGGGGGTG ATAGACGGTT TTTCGCCGTT TGACGTTGGA 5760 

GTCCACGTTC TTTAATAGTG GACTCTTGTT CCAAACTGGA ACAACACTCA ACCCTATCTC 5820 

GGGGTATTGT TTTGATTTAT AAGGGATTTT GGCGATTTCG GAACCACCAT CAAACAGGAT 5880 

TTTCGCCTGC TGGGGCAAAC CAGCGTGGAC CGCTTGCTGC AACTCTCTCA GGGCCAGGCG 5940 

GTGAAGGGGA ATGAGCTGTT GCCCGTCTCG CTGGTGAAAA GAAAAACCAC CCTGGCGCCC 6000 

AATACGCAAA CCGCCTCTCC CCGCGCGTTG GCCGATTCAT TAATGCAGCT GGCACGACAG 6060 

GTTTCGCGAC TGGAAAGCGG GCAGTGAGCG CAACGCAATT AATGTGAGTT AGCTC/ ^A 6120 

TTAGGCACCC CAGGCTTTAC ACTTTAT6CT TCCG6CTC6T ATGTTGTGTG GAATTGlwAG 6180 

CGGATAACAA TTTCACACGG CAAGGAGAGA GTCATAATGA AATACCTATT GCCTACGGCA 6240 

GCCGCTGGAT TGTTATTACT CGCTGCCCAA CCAGCCATGG CCGAGCTCTT CCCGCCATCT 6300 

GATGAGCAGT TGAAATCTGG AACTGCCTCT GTTGTGTGCC TGCTGAATAA CTTCTATCCC 6360 

AGAGAGGCCA AAGTACAGTG GAAGGTGGAT AACGCCCTCC AATCGGGTAA CTCCCAGGAG 6420 

AGTGTCACAG AGCAGGACAG CAAGGACAGC ACCTACAGCC TCAGCAGCAC CCTGACGCTG 6480 

AGCAAAGCAG ACTACGAGAA ACACAAAGTC TAGGCCTGCG AAGTCACCCA TCAGGGCCTG 6540 

AGCTCGCCCG TCACAAAGAG CTTCAACAGG GGAGAGTGTT CTAGAACGCG TCACTTGGCA 6600 

CTGGCCGTCG TTTTACAACG TCGTGACTGG GAAAACCCTG GCGTTACCCA AGCTTAATGG 6660 

CCTTGCAGAA TTCCCTTTCG CCAGCTGGCG TAATAGCGAA GAGGCCCGCA CCGATCGCCC 6720 

TTCCCAACAG TTGCGCAGCC TGAATGGCGA ATGGCGCTTT GCCTGGTTTC GGGGAGCAGA 6780 

AGCGGTGCCG GAAAGCTGGC TGGAGTGCGA TCTTCCTGAG GCCGATACGG TCGTCGTCCG 6840 

CTCAAACTGG CAGATGCACG GTTACGATGC GCCCATCTAC ACCAACGTAA CCTATGGGAT 6900 

TACGGTCAAT CCGCCGTTTG TTCCCACGGA GAATCCGACG GGTTGTTACT CGCTCACATT 6960 

TAATGTTGAT GAAAGCTGGC TACAGGAAGG CCAGACGCGA ATTATTTTTG ATGGGGTTGG 7020 

TATTGGTTAA AAAATGAGCT GATTTAACAA AAATTTAACG CGAATTTTAA CAAAATATTA 7080 

ACGTTTACAA TTTAAATATT TGCTTATACA ATCTTGGTGT TTTTGGGGGT TTTGTGATTA 7140 

TCAACCGGGG TACATATGAT TGACATGCTA GTTTTACGAT TACCGTTCAT CGATTCTCTT 7200 

GTTTGCTCGA GACTCTCAGG CAATGAGCTG ATAGGGTTTG TAGATGTGTG AAAAATAGGT 7260 

ACCCTCTCCG GCATTAATTT ATCAGCTAGA ACGGTTGAAT ATCATATTGA TGGTGATTTG 7320 

ACTGTCTCGG GCCTTTCTCA CCCTTTTGAA TCTTTACGTA GACATTACTC AGGGATTGGA 7380 
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TTTAAAATAT ATGAGGGTTC TAAAAATTTT TATCGTTGCG TTGAAATAAA GGCTTCTCCC 7440 
GCAAAAGTAT TACAGGGTCA TAATGTTTTT GGTACAACCG ATTTAGCTTT ATGCTCTGAG 7500 
GCTTTATTGC TTAATTTTGC TAATTCTTTG CCTTGCCTGT ATGATTTATT GGATGTT 7557 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8118 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



AATGCTACTA 


CTATTAGTAG 


AATTGATGCC 


ACCTTTTCAG 


CTCGCGCCCC 


AAATGAAAAT 


60 


ATAGCTAAAC 


AGGTTATTGA 


CCATTTGCGA 


AATGTATCTA 


ATGGTCAAAC 


TAAATCTACT 


120 


CGTTCGCAGA 


ATTGGGAATC 


AACTGTTACA 


TGGAATGAAA 


CTTCCAGACA 


CCGTACTTTA 


180 


GTTGCATATT 


TAAAACATGT 


TGAGCTACAG 


CACCAGATTC 


AGCAATTAAG 


CTCTAAGCCA 


240 


TCTGCAAAAA 


TGACCTCTTA 


TCAAAAGGAG 


CAATTAAAGG 


TACTCTCTAA 


TCCTGACCTG 


300 


TTGGAGTTTG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGCTCGAA 


TTAAAACGCG 


ATATTTGAAG 


360 


TCTTTCGGGC 


TTCCTCTTAA 


TCTTTTTGAT 


GCAATCCGCT 


TTGCTTCTGA 


CTATAATAGT 


420 


CAGGGTAAAG 


ACCTGATTTT 


TGATTTATGG 


TCATTCTCGT 


TTTCTGAACT 


GTTTAAAGCA 


480 


TTTGAGGGGG 


ATTCAATGAA 


TATTTATGAC 


C-ATTCCC-GAG 


TATTGGACGC 


TATCCAGTCT 


540 


AAACATTTTA 


CTATTACCCC 


CTCTGGCAAA 


ACTTCTTTTG 


CAAAAGCCTC 


TCGCTATTTT 


600 


GGTTTTTATC 


GTCGTCTGGT 


AAAGGAGGGT 


TATGATAGTG 


TTGGTGTTAC 


TATGCCTCGT 


660 


AATTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


GTATTCCTAA 


ATCTCAACTG 


720 


ATGAATCTTT 


CTAGCTGTAA 


TAATGTTGTT 


GCGTTAGTTG 


GTTTTATTAA 


GGTAGATTTT 


780 


TCTTCCCAAC 


GTCCTGACTG 


GTATAATGAG 


CCAGTTCTTA 


AAATCGCATA 


AGGTAATTCA 


840 


CAATGATTAA 


AGTTGAAATT 


AAACCATCTC 


AAGCCCAATT 


TACTACTGGT 


TCTGGTGTTT 


900 


CTCGTCAGGG 


CAAGCCTTAT 


TCACTGAATG 


AGCAGCTTTG 


TTACGTTGAT 


TTGGGTAATG 


960 


AATATCCGGT 


TCTTGTCAAG 


ATTACTCTTG 


ATGAAGGTCA 


GCCAGCCTAT 


GCGCCTGGTC 


1020 


TGTACACCGT 


TCATCTGTCC 


TCTTTCAAAG 


TTGGTCAGTT 


CGGTTCCCTT 


ATGATTGACC 


1080 


GTCTGCGCCT 


CGTTCCGGCT 


AAGTAACATG 


GAGCAGGTCG 


CGGATTTCGA 


CACAATTTAT 


1140 


CAGGCGATGA 


TACAAATCTC 


CGTTGTACTT 


TGTTTCGCGC 


TTGGTATAAT 


CGCTGGGGGT 


1200 


CAAAGATGAG 


TGTTTTAGTG 


TATTCTTTCG 


CCTCTTTCGT 


TTTAGGTTGG 


TGGCTTCGTA 


1260 


GTGGCATTAC 


GTATTTTACC 


CGTTTAATGG 


AAACTTCCTC 


ATGAAAAAGT 


CTTTAGTCCT 


1320 


CAAAGCCTCT 


GTAGCCGTTG 


CTACCCTCGT 


TCCGATGCTG 


TCTTTCGCTG 


CTGAGGGTGA 


1380 


CGATCCCGCA 


AAAGCGGCCT 


TTAACTCCCT 


GCAAGCCTCA 


GCGACCGAAT 


ATATCGGTTA 


1440 


TGCGTGGGCG 


ATGGTTGTTG 


TCATTGTCGG 


CGCAACTATC 


GGTATCAAGC 


TGTTTAAGAA 


1500 
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ATTCACCTGG AAAGCAAGCT GATAAACCGA TAGAATTAAA GGCTCCTTTT GGAGGCTTTT 1560 

TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 

TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGGAA AAC C G C ATAG AGAAAATTCA 1680 

TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 1740 

CTGTGGAATG CTACAGGGGT TGTAGTTTGT AGTGGTGACC- AAAGTGAGTG TTACGGTAGA 1800 

TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 1860 

TCTGAGGGTG GCGGTTGTGA GGGTGGCGGT ACTAAACCTC GTGAGTAGGG TGATACAGCT 1920 

ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 

AACGGGGGTA ATGGTAATGG TTGTGTTGAG GAGTGTGAGC GTGTTAATAC TTTGATGTTT 2040 

CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 

GAAGGGAGTG AGCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 

TATGAGGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 

GATGGATTCG TTTGTGAATA TCAAGGGGAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 

GCTGGCGGCG GCTCTGGTGG TGGTTGTGGT GGGGGCTCTG AGGGTGGTGG CTCTGAGGGT 2340 

GGGGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 2400 

GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 

GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 2520 

GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 

GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 2640 

TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 2700 

TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 2760 

TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 2820 

TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTTTG GGTATTCCGT 2880 

TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTI CGGCTATCTG CTTACTTTTC 2940 

TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG 3000 

GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGGTCAATTA CCCTCTGACT 3060 

TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 

TCTCTGTAAA GGCTGCTATT TTCATTTTTG AGGTTAAACA AAAAATCGTT TCTTATTTGG 3180 

ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG 3240 

CTCGTTAGCG TTGGTAAGAT TCAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 

CTTGATTTAA GGCTTGAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360 

CTTAGAATAC CGGATAAGCG TTGTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 3420 

TCCTACGATG AAAATAAAAA CGGCTTGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAAT 3480 

ACCCGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 3540 
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AAATTAGGAT 


GGGATATTAT 


TTTTCTTGTT 


CAGGAGTTAT 


GTATTGTTGA 


TAAAGAGGCG 


3600 


CGTTCTGCAT 


TAGCTGAACA 


TGTTGTTTAT 


TGTCGTCGTC 


TGGACAGAAT 


TACTTTACCT 


3660 


TTTGTCGGTA 


CTTTATATTC 


TCTTATTACT 


GGCTCGAAAA 


TGCCTCTGCC 


TAAATTAGAT 


3720 


GTTGGCGTTG 


TTAAATATGG 


CGATTCTCAA 


TTAAGCCCTA 


CTGTTGAGCG 


TTGGCTTTAT 


3780 


ACTGGTAAGA 


ATTTGTATAA 


CGCATATGAT 


ACTAAACAGG 


GTTTTTCTAG 


TAATTATGAT 


3840 


TCCGGTGTTT 


ATTCTTATTT 


AACGCCTTAT 


TTATCACACG 


GTCGGTATTT 


CAAACCATTA 


3900 


AATTTAGGTC 


AGAAGATGAA 


GCTTACTAAA 


ATATATTTGA 


AAAAGTTTTC 


ACGCGTTCTT 


3960 


TGTGTTGCGA 


TTGGATTTGC 


ATCAGCATTT 


ACATATAGTT 


ATATAACCCA 


ACCTAAGCCG 


4020 


GAGGTTAAAA 


AGGTAGTCTC 


TCAGACCTAT 


GATTTTGATA 


AATTCACTAT 


TGACTCTTCT 


4080 


CAGCGTCTTA 


ATCTAAGCTA 


TCGCTATGTT 


TTCAAGGATT 


CTAAGGGAAA 


ATTAATTAAT 


4140 


AGCGACGATT 


TACAGAAGCA 


AGGTTATTCA 


CTGACATATA 


TTGATTTATG 


TACTGTTTCC 


4200 


ATTAAAAAAG 


GTAATTCAAA 


TGAAATTGTT 


AAATGTAATT 


AAiirrurrr 


TCTTGATGTT 


4260 


TGTTTCATCA 


TCTTCTTTTG 


CTCAGGTAAT 


TGAAATGAAT 


AATTCGCCTC 


TGCGCGATTT 


4320 


TGTAACTTGG 


TATTCAAAGC 


AATCAGGCGA 


ATCCGTTATT 


__________ 


ATGTAAAAGG 


4380 


TACTGTTACT 


GTATATTCAT 


CTGACGTTAA 


ACCTGAAAAT 


CTACGCAATT 


TCTTTATTTC 


4440 


TGTTTTAGGT 


GCTAATAATT 


TTGATATGGT 


TGGTTCAATT 


CCTTCCATAA 


TTCAGAAGTA 


4500 


TAATCCAAAC 


AATCAGGATT 


ATATTGATGA 


ATTGCCATCA 


TCTGATAATC 


AGGAATATGA 


4560 


TGATAATTCC 


GCTCCTTCTG 


GTGGTTTCTT 


TGTTCCGCAA 


AATGATAATG 


TTACTCAAAC 


4620 


TTTTAAAATT 


AATAACGTTC 


GGGCAAAGGA 


TTTAATACGA 


GTTGTCGAAT 


TGTTTGTAAA 


4680 


GTGTAATAGT 


TGTAAATGGT 


CAAATGTATT 


ATCTATTGAC 


GGCTCTAATC 


TATTAGTTGT 


4740 


TAGTGCACCT 


AAAGATATTT 


TAGATAACCT 


TCCTCAATTC 


CTTTCTACTG 


TTGATTTGCC 


4800 


AAGTGAGGAG 


ATATTGATTG 


AGGGTTTGAT 


ATTTGAGGTT 


CAGCAAGGTG 


ATGCTTTAGA 


4860 


TTTTTCATTT 


GCTGCTGGCT 


CTCAGGGTGG 


CACTGTTGCA 


GGCGGTGTTA 


ATACTGACCG 


4920 


GGTGAGGTCT 


GTTTTATGTT 


GTGGTGGTGG 


TTGGTTCGGT 


ATTTTTAATG 


GCGATGTTTT 


4980 


AGGGCTATCA 


GTTCGCGCAT 


TAAAGACTAA 


TAGCCATTCA 


AAAATATTGT 


CTGTGCCACG 


5040 


TATTCTTACG 


CTTTCAGGTC 


AGAAGGGTTG 


TATCTGTGTT 


GGGCAGAATG 


TCCCTTTTAT 


5100 


TACTGGTCGT 


GTGACTGGTG 


AATGTGCCAA 


TGTAAATAAT 


CCATTTCAGA 


GGATTGAGCG 


5160 


TCAAAATGTA 


GGTATTTCCA 


TGAGGGTTTT 


TCGTGTTGCA 


ATGGGTGGGG 


GTAATATTGT 


5220 


TCTGGATATT 


ACCAGCAAGG 


CCGATAGTTT 


GAGTTCTTCT 


ACTCAGGCAA 


GTGATGTTAT 


5280 


TACTAATCAA 


AGAAGTATTG 


CTACAACGGT 


TAATTTGCGT 


GATGGACAGA 


GTGTTTTAGT 


5340 


CGGTGGCCTC 


ACTGATTATA 


AAAACACTTC 


TCAAGATTCT 


GGCGTACCGT 


TCCTGTCTAA 


5400 


AATCCCTTTA 


ATCGGCCTCC 


TGTTTAGCTC 


CCGCTCTGAT 


TGCAAGGAGG 


AAAGGAGGTT 


5460 


ATACGTGCTC 


GTCAAAGCAA 


CCATAGTACG 


CGCCCTGTAG 


CGGCGCATTA 


AGCGCGGCGG 


5520 


GTGTGGTGGT 


TACGCGCAGC 


GTGACCGCTA 


CACTTGGGAG 


CGCCCTAGCG 


CGGGCTGGTT 


5580 



WO 92/06204 



PCT/US91/07149 



61 

TCGCTTTCTT CCCTTGCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC 5640 

GGGGGCTCCC TTTAGGGTTC CGATTTAGTG CTTTAoGGCA CCTCGACCCC AAAAAACTTG 5700 

ATTTGGGTGA TGGTTGAGGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT GGGGGTTTGA 5760 

CGTTGGAGTC CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA ACACTCAACC 5820 

CTATCTCGGG CTATTGTTTT GATTTATAAG GGATTTTGCC GATTTCGGAA CGAGGATGAA 5880 

ACAGGATTTT CGCCTGCTGG GGCAAACCAG CGTGGACCGC TTGCTGCAAC TCTCTCAGGG 5940 

GCAGGGGGTG AAGGGGAATC AGCTGTTGCC GGTCTCGCTG GTGAAAAGAA AAAGGAGGGT 6000 

GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 6060 

AGGACAGGTT TGGGGAGTGG AAAGGGGGGA GTGAGGGGAA CGCAATTAAT GTGAGTTAGC 6120 

TCACTCATTA GGCACCGCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 6180 

TTGTGAGGGG ATAAGAATTT CACACGGGAA GGAGACAGTC ATAATGAAAT ACCTATTGCC 6240 

TACGGCAGCC GCTGGATTGT TATTACTCGC TGCCCAACCA GCCATGGCCG AGCTCTTCCC 6300 

GCCATGTGAT GAGCAGTTGA AATCTGGAAC TGCCTCTGTT GTGTGCCTGC TGAATAACTT 6360 

CTATCCCAGA GAGGCCAAAG TACAGTGGAA GGTGGATAAC GCCCTCCAAT CGGGTAACTC 6420 

GGAGGAGAGT GTGACAGAGC AGGACAGCAA GGACAGCACC TACAGCCTCA GCAGCACCCT 6480 

GACGCTGAGC AAAGCAGACT ACGAGAAACA CAAAGTCTAC GCCTGCGAAG TCACCCATCA 6540 

GGGCCTGAGC TCGCCCGTCA CAAAGAGCTT CAACAGGGGA GAGTGTTCTA GAACGCGTCA 6600 

CTTGGCACTG GCCGTCGTTT TACAACGTCG TGAGTGGGAA AACCCTGGCG TTACCCAAGC 6660 

TTTGTACATG GAGAAAATAA AGTGAAACAA AGCACTATTG CACTGGCACT CTTACCGTTA 6720 

CTGTTTACCC CTGTGGCAAA AGCCGCCTCG ACCAAGGGCC CATCGGTCTT CCCCCTGGCA 6780 

CGCTCCTCCA AGAGCACCTC TGGGGGCACA GCGGCCCTGG GCTGCCTGGT CAAGACTAAT 6840 

TCCCCGAACC GGTGACGGTG TCGTGGAACT CAGGCGCCCT GACCAGCGGC GTGCACACCT 6900 

TCCCGGCTGT CCTACAGTCG TCAGGACTCT ACTCCCTCAG CAGCGTGGTG ACCGTGCCCT 6960 

CCAGCAGCTT GGGCACCCAG ACCTACATCT GCAACGTGAA TCACAAGCCC AGGAACAGCA 7020 

AGGTGGACAA GAAAGCAGAG CCCAAATCTT GTACTAGTGG ATCCTACCCG TACGACGTTC 7080 

CGGACTACGC TTCTTAGGCT GAAGGCGATG ACCCTGCTAA GGCTGCATTC AATAGTTTAC 7140 

AGGCAAGTGC TACTGAGTAC ATTGGCTACG CTTGGGCTAT GGTAGTAGTT ATAGTTGGTG 7200 

CTACCATAGG GATTAAATTA TTCAAAAAGT TTACGAGCAA GGCTTCTTAA GCAATAGCGA 7260 

AGAGGCCCGC AGCGATCGCC CTTCCCAACA GTTGCGCAGC CTGAATGGCG AATGGCGCTT 7320 

TGCGTGGTTT GCGGCACCAG AAGCGGTGCC GGAAAGCTGG CTGGAGTGCG ATCTTCCTGA 7380 

GGCCGATACG GTCGTCGTCC CCTCAAACTG GCAGATGCAC GGTTACGATG CGCCCATCTA 7440 

CACCAACGTA ACCTATCCGA TTACGGTCAA TCCGCCGTTT GTTCCCACGG AGAATCCGAC 7500 

GGGTTGTTAC TCGCTCACAT TTAATGTTGA TGAAAGCTGG CTACAGGAAG GCCAGACGCG 7560 

AATTATTTTT GATGGCGTTC CTATTGGTTA AAAAATGAGC TGATTTAAGA AAAATTTAAC 7620 
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GCGAATTTTA ACAAAATATT AAGGTTTACA ATTTAAATAT TTGGTTATAG AATCTTCCTG 7580 

__________ TTTTCTQATT ATCAAGCGGG GTACATATGA TTGACATGCT AGTTTTACGA 7740 

TTACCGTTCA TCGATTCTCT TGTTTGCTCC AGACTCTCAG GCAATGACCT GATAGGGTTT 7800 

GTAGATCTCT GAAAAATAGC TACGGTCTGC GGGATTAATT TATCAGCTAG AACGGTTGAA 7860 

TATCATATTG ATGGTGATTT GACTGTCTCC GGCCTTTCTC ACCGTTTTGA ATCTTTAGGT 7920 

ACACATTACT CAGGCATTGC ATTTAAAATA TATGAGGGTT CTAAAAATTT TTATCCTTGC 7980 

GTTGAAATAA AGGCTTCTCC CGCAAAAGTA TTACAGGGTC ATAATGTTTT TGGTACAACC 8040 

GATTTAGCTT TATGCTCTGA GGCTTTATTG CTTAATTTTG CTAATTCTTT GCCTTGCCTG 8100 

TATGATTTAT TGGACGTT 8118 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
(G) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(ixl FEATURE: 

(A) NAME/KEY: misc difference 

(B) LOCATION: replace (5, "") 

(D) OTHER INFORMATION : /note- "S REPRESENTS EQUAL MIXTURE 
OF G AND C" 

(ix) FEATURE: 

(A) NAME/KEY: misc difference 

(B) LOCATION: renlace(6. 

(D) OTHER INFORMATION: /note- "M REPRESENTS EQUAL MIXTURE 
OF A AND C" 

(ix) FEATURE: 

(A) NAME/KEY: misc difference 

(B) LOCATION: repiace(8, B «) 

(D) OTHER INFORMATION: /note- "R REPRESENTS EQUAL MIXTURE 
OF A AND G" 

(ix) FEATURE: 

(A) NAME/KEY: misc difference 

(B) LOCATION: replace (11, »") 

(D) OTHER INFORMATION: /note- "K REPRESENTS EQUAL MIXTURE 
OF G AND T" 

(ix) FEATURE: 

(A) NAME/KEY: misc difference 

(B) LOCATION: replace (20, "") 

(D) OTHER INFORMATION: /note- "W REPRESENTS EQUAL MIXTURE 
OF A AND T" 



(xi) SEQUENCE DESCRIPTION: SEO ID NO: 6: 
AGGTSMARCT KCTCGAGTCW GG 



22 
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(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
AGGTCCAGCT GCTCGAGTCT GG 22 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

A) LENGTH: 22 base pairs 

B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 
AGGTCCAGCT GCTCGAGTCA GG 22 
(2) INFORMATION FOR SEQ ID NO: 9: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
AGGTCCAGCT TCTCGAGTCT GG 22 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: 
AGGTCCAGCT TCTCGAGTCA GG 22 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
AGGTCCAACT GCTCGAGTCT GG 
(2) INFORMATION FOR SEQ ID NO: 12: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

AGGTCCAACT GCTCGAGTCA GG 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 base pairs 
(S) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
AGGTCCAACT TCTGGAGTGT GG 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
AGGTCCAACT TCTCGAGTCA GG 22 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAME/KEY: misc difference 

(B) LOCATION: rep lice (5 .. 6, ■■) 

(D) OTHER INFORMATION: /note- "N-INOSINE" 

(ix) FEATURE: 

(A) NAME/KEY: misc difference 

(B) LOCATION: repllce(8, ■") 

(D) OTHER INFORMATION: /note- "N-INOSINE" 
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(ix) FEATURE: 

(A) NAME/KEY: misc_dif f erence 

(B) LOCATION: replace (11, "") 

(D) OTHER INFORMATION: /note- "N-INOSINE" 

(ix) FEATURE: 

(A) NAME/KEY: misc_difference 

(B) LOCATION: replace (20, "") 

(D) OTHER INFORMATION: /note- "W REPRESENTS EQUAL MIXTURE 
OF A AND T" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
AGGTNNANCT NCTCGAGTCtf GG 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

CTATTAACTA GTAACGGTAA CAGTGGTGCC TTGCCCCA 38 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEOUENCE CHARACTERISTICS: 
(A) LENGTH: 30 base pairs 
(JB) lYrE: nucleic acid 

(C) STRANDEDNESS: single. 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
AGGCTTACTA GTACAATCCC TGGGCACAAT 30 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: IS: 

CCAGTTCCGA GCTCGTTGTG ACTCAGGAAT CT 32 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 32 base pairs 

B) TYPE: nucleic acid 

C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEO ID NO:19: 
CCAGTTCCGA GCTCGTGTTG ACGCAGCCGC GC 32 
(2) INFORMATION FOR SEO ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 bass pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

CCAGTTCCGA GCTCGTGCTC ACCCAGTCTC CA 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 32 base pairs 
(S) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
CCAGTTCCGA GCTCCAGATG ACCCAGTCTC CA 32 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
CCAGATGTGA GCTCGTGATG ACCCAGACTC CA 32 
(2) INFORMATION FOR SEO ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
CCAGATGTGA GCTCGTCATG ACCCAGTCTC CA 32 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 
(G) STRANDEDNESS: single 
(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
CCAGTTCCGA GCTCGTGATG ACACAGTCTC CA 
(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

GCAGCATTCT AGAGTTTCAG CTCCAGCTTG CC 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 34 base pairs 
fBI TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
GCGCCGTCTA GAATTAACAC TCATTCCTGT TGAA 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
GATCCTAGGC TGAAGGCGAT GACCCTGCTA AGGCTGC 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
ATTCAATAGT TTACAGGCAA GTGCTACTGA GTACA 



35 
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(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
TTGGCTACGC TTGGGCTATG GTAGTAGTTA TAGTT 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
GGTGCTACCA TAGGGATTAA ATTATTCAAA AAGTT 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 
.(C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
TACGAGCAAG GCTTCTTA 
(2) INFORMATION FOR SEQ ID NO: 32: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
AGCTTAAGAA GCCTTGCTCG TAAACTTTTT GAATAATTT 39 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: 3EQ ID NO: 33: 



AATCCCTATG GTAGCACCAA CTATAACTAC TACCAT 



36 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 35 base pairs 
(£) TYPE: nucleic acid 
C C) STRANDEDNESS: sinele 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

AGCCCAAGCG TAGCCAATGT ACTCAGTAGC ACTTG 35 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 34 base pairs 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
ATCGCCTTCA GCCTAG 16 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYFE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
CATTTTTGCA GATGGCTTAG A 21 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 




(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 



CCTGTAAkCT ATTGAATGCA GCCTTAGCAG GGTC 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
TAGCATTAAC GTCCAATA 
(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
ATATATTTTA GTAAGCTTCA TCTTCT 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

C) STRANDEDNESS: single 

D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
GACAAAGAAC GCGTGAAAAC TTT 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
GCGGGCCTCT TCGCTATTGC TTAAGAAGCC TTGCT 
(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
AAACGACGGC CAGTGCCAAG TGACGCGTGT GAAATTGTTA TCC 
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(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 
GGCGAAAGGG AATTGTGGAA GGCGATTAAG CTTGGGTAAC GCC 43 
(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

GGCGTTACCC AAGCTTTGTA CATGGAGAAA ATAAAG 36 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 42 base pairs 
fB) TYPE: nucleic acid 

C) STRANDEDNESS: single 

D) TOPOLOGY : linear 



(xi - SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
TGAAACAi-iiW CACTATTGCA CTGGCACTCT TACCGTTACC GT 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
TACTGTTTAC CCCTGTGACA AAAGCCGCCC AGGTCCAGCT GC 42 
(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



WO 92/06204 



PCT/US9I/07149 



72 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 
TCGAGTCAGG CCTATTGTGC CCAGGGATTG TACTAGTGGA TCCG 
(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
TGGCGAAAGG GAATTCGGAT CCACTAGTAC AATCCCTG 
(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
GGCACAATAG GCCTGAGTCG AGCAGCTGGA CCAGGGCGGC TT 42 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
TTGTCACAGG GGTAAACAGT AACGGTAACG GTAAGTGTGC CA 42 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
GTGCAATAGT GCTTTGTTTC ACTTTATTTT CTCCATGTAC AA 42 
(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
TAACGGTAAG AGTGCCAGTG C 
(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

CACCTTCATG AATTCGGCAA GGAGACAGTC AT 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 base pairs 

(E) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
AATTCGCCAA GGAGACAGTC AT 22 
(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYFE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
AATGAAATAC CTATTGCCTA CGGCAGCCGC TGGATTGTT 39 
(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
ATTACTCGCT GCCCAACCAG CCATGGCCGA GCTCGTGAT 
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(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
GACCCAGACT CCAGATATCC AACAGGAATG AGTGTTAAT 39 
(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 
(G) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 

TCTAGAACGC GTC 

(2) INFORMATION FOR SEQ ID NO: 59: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base nairs 

(B) TYPE: nucleic acid 
(.C) STRANDEDNESS: single 
(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
TTCAGGTTGA AGCTTACGCG TTCTAGAATT AACACTCATT CCTGT 45 
(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
TGGATATCTG GAGTCTGGGT CATCACGAGC TCGGCCATG 39 
(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
GCTGGTTGGG CAGCGAGTAA TAACAATCCA GCGGCTGCC 39 
(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 
GTAGGCAATA GGTATTTGAT TATGAGTGTC CTTGGCG 37 
(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 
TGACTGTCTC CTTGGCGTGT GAAATTGTTA 30 
(2) INFORMATION FOR SEQ ID NO: 64: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 
TAACACTCAT TCCGGATGGA ATTCTGGAGT CTGGGT 36 
(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 
GCCAGTGCCA AGTGACGCGT TCTA 24 
(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 
ATATATTTTA GTAAGCTTCA TCTTCT 26 
(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEOUENCE CHARACTERISTICS- 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
GACAAAGAAC GCGTGAAAAC TTT 23 
(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 76 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 
CTGAACCTGT CTGGGACCAC AGTTGATGCT ATAGGATCAG ATCTAGAATT CATTTAGAGA 60 
GTGGCGTGGC TTGTGC . 76 
(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 69: 
TCGACCGTTG GTAGGAATAA TGCAATTAAT GGAGTAGCTC TAAATTCAGA ATTCATCTAC 60 
ACCCAGTGCA TCCAGTAGCT 80 
(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEOUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 
GGTAAACAGT AACGGTAAGA GTGCCAG 
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(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
CGCCTTCAGC CTAAGAAGCG TAGTCCGGAA CGTCGTACGG GTAGGATCCA CTAG 
(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 

CACCGGTTCG GGGAATTAGT CTTGACCAGG CAGCCCAGGG C 

(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 51 base pairs 
(E) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 
ATTCCACACA TTATACGAGC CGGAAGCATA AAGTGTCAAG CCTGGGGTGC C 
(2) INFORMATION FOR SEQ ID NO: 74: 

m SEOUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYris: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 
CTGCTCATCA GATGGCGGGA AGAGCTCGGC CATGGCTGGT TG 
(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 
GAACAGAGTG ACCGAGGGGG CGAGCTCGGC CATGGCTGGT TG 
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1. A composition of matter comprising a 
plurality of ceils containing diverse combinations of first 
and second DNA sequences encoding first and second 
polypeptides which form heteromeric receptors, one or both 

5 of said polypeptides being expressed as fusion proteins on 
the surface of a cell. 

2. The composition of claim 1, wherein said 
plurality of cells are E. coli . 

3- The composition of claim 1, wherein said 
heteromeric receptors selected from the group consisting of 
antibodies, T cell receptors, integrins. hormone receptors 
and transmitter receptors. 

A, The composition of claim 1, wherein said 
first and second DNA sequences encode functional portions 
of heteromeric receptors. 

5. The composition of claim 4, wherein said 
first and second DNA sequences encode functional portions 
of the variable heavy and variable light chains of an 
antibody. 

6. The composition of claim 1. wherein said 
ceil produces filamentous bacteriophage. 

7. The composition of claim 6, wherein said 
filamentous bacteriophage are selected from the group 
consisting of M13 , fd and fl. 

8. The composition of claim 6, wherein at least 
one of the encoded first or second polypeptides is 
expressed as a fusion protein with gene VIII. 
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9. A kit for the preparation of vectors useful 
for the coexpression of two or more DNA sequences encoding 
polypeptides which form heteromeric receptors comprising 
two vectors, a first vector having two pairs of restriction 

5 sites symmetrically oriented about a cloning site which can 
be combined with a second vector, having two pairs of 
restriction sites symmetrically oriented about a cloning 
site and in an identical orientation to that of the first 
vector, wherein one or both vectors contains sequences 
10 necessary for expression of polypeptides encoded by DNA 
sequences inserted in said cloning sites. 

10. The kit of claim 3, wherein said first and 
second vectors are circular. 

11. The kit of claim 9, wherein said expression 
peptides is as fusion proteins on the surface of a cell. 

12. The kit of claim 9, wherein said cell 
produces filamentous bacteriophage. 

13. The kit of claim 9, wherein said filamentous 
bacteriophage is selected from the group consisting of M13, 
fd and fl. 

14. The kit of claim 13, wherein at least one of 
the DNA sequences is expressed as a fusion protein with 
gene VIII. 

15. The kit of claim 9, wherein said two pairs 
of restriction sites are Hind III-Mlu I and Hind III-Mlu I. 
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16. A cloning system for the coexpression of two 
or more DNA sequences encoding polypeptides which form a 
heteromeric receptor,- comprising a set of first vectors 
having a diverse population of first DNA sequences and a 

5 set of second vectors having a diverse population second 
DNA sequences, said first and second vectors having two 
pairs of restriction sites symmetrically oriented about a 
cloning site for containing said first and second 
populations of DNA sequences so as to allow only the 
10 operational combination of vector sequences containing said 
first and second DNA sequences. 

17. The cloning system of claim 16, wherein said 
first and second vectors are circular. 

18. The cloning system of claim 16, wherein said 
heteromeric receptors selected from the group consisting of 
antibodies, T cell receptors , integrins, hormone receptors 
and transmitter receptors. 

19. The cloning system of claim 16, wherein said 
first and second DNA sequences encode functional portions 
of heteromeric receptors. 

20. The cloning system of claim 19, wherein said 
first and second DNA sequences encode functional portions 
of the variable heavy and variable light chains of an 
antibody . 

21. The cloning system of claim 16, wherein said 
coexpression of two or more DNA sequences encoding 
polypeptides which form a heteromeric receptor is on the 
surface of cell. 

22. The cloning system of claim 16, wherein said 
cell produces a filamentous bacteriophage. 
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23. The cloning system of claim 22 wherein said 
filamentous bacteriophage selected from the group 
consisting of M13, fd and fl. 

24. The cloning system of claim 23, wherein at 
least one of the DNA sequences is expressed as a fusion 
protein with the protein product of gene VIII. 

25. The cloning system of claim 16, wherein said 
two pairs of restriction sites are Kind IIl-Mlu I and Hind 
III-Mlu I. 

26. A plurality of expression vectors containing 
a plurality of possible first and second DNA sequences 
encoding polypeptides which form a heteromeric receptor 
exhibiting binding activity toward a preselected molecule, 

5 said DNA sequence encoding heteromeric receptors being 
operatively linked to genes encoding surface proteins of a 
cell. 

27. The expression vectors of claim 26, wherein 
said expression vectors are circular. 

28. The expression vectors of claim 23, wherein 
said heteromeric receptors are selected from the group 
consisting of antibodies, T ceil- receptors, integrins, 
hormone receptors and transmitter receptors. 

29. The expression vectors of claim 26, wherein 
said first and second DNA sequences encode functional 
portions of heteromeric receptors. 

30. The expression vectors of claim 29, wherein 
said first and second DNA sequences encode functional 
portions of the variable heavy and variable light chains of 
an antibody. 
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31. The expression vectors of claim 26, wherein 
said cells produce filamentous bacteriophage. 

32. The expression vectors of claim 26, wherein 
said filamentous bacteriophage are selected from the group 
consisting of Ml 3 , fd and fl. 

33. The expression vectors of claim 32, wherein 
at least one of the encoded first or second polypeptides is 
expressed as a fusion protein with gene VIII. 

34. A method of constructing a diverse 
population of vectors capable of expressing a diverse 
population of heteromeric receptors, comprising: 

(a) operationally linking to a first vector 
5 a first population of diverse DNA 

sequences encoding a diverse population 
of first polypeptides. said first 
vector having two pairs of restriction 
sites symmetrically oriented about a 
10 cloning site; 

(b) operationally linking to a second 
vector a second population of diverse 
DNA sequences encoding a diverse 
population of second polypeptides, said 

15 second vector having two pairs of 

restriction sites symmetrically 
oriented about a cloning site in an 
identical orientation to that of the 
first vector; and 

20 (c) combining the vector products of step 

(a) and (b) under conditions which 
allow only the operational combination 
of vector sequences containing said 
first and second DNA sequences. 
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35. The method of claim 34, wherein said first 
and second vectors are circular. 

36. The method of claim 34, wherein said 
heteromeric receptors are selected from the group 
consisting of antibodies, T cell receptors, integrins, 
hormone receptors and transmitter receptors. 

37. The method of claim 34, wherein said first 
and second DNA sequences encode functional portions of the 
variable heavy and variable light chains of an antibody. 

38. The method of claim 34, wherein said 
expression of a diverse population of heteromeric receptors 
is on the surface of a cell. 

39. The method of claim 37, wherein said cell 
produces a bacteriophage. 

40. The method of claim 39, wherein said 
filamentous bacteriophage is selected from the group 
consisting of M13 , fd and fl. 

41. The method of claim 34, wherein at least one 
of said first or second DNA sequences is expressed as a 
gene VIII fusion protein. 

42. The method of claim 34, wherein said two 
pairs of restriction sites are Hind III-Miu I and Hind III- 
Mlu I. 
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43. The method of claim 34, wherein said 
combining step further comprises: 

(CI) restricting said first vector with a 
restriction enzyme recognizing one of 
the restriction sites encoded in said 
two pairs of restriction sites; 

(C2) restricting said second vector with a 
different restriction enzyme 
recognizing the second restriction 
site encoded in said two pairs of 
restriction sites; 

(C3) digesting the 3' ends of said 
restricted first and second vectors 
with an exonuclease; and 

(C4) annealing said first and second 
vectors . 
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44. A method for selecting a heteromeric 
receptor exhibiting binding activity toward a preselected 
molecule from a population of diverse heteromeric 
receptors, comprising: 
5 (a) operationally linking to a first vector 

a first population of diverse DNA 
sequences encoding a diverse population 
of first polypeptides, said first 
vector having two pairs of restriction 
10 sites symmetrically oriented about a 

cloning site; 



(b) operationally linking to a second 
vector a second population of diverse 
DNA sequences encoding a diverse 

15 population of second polypeptides, said 

second vector having two pairs of 
restriction sites symmetrically 
oriented about a cloning site in an 
identical orientation to that of the 

2 0 first vector; 



(c) combining the vector products of step 
(a) and (b) under conditions which 
allow only the operational combination 
of vector sequences containing said 
2 5 first and second DNA sequences. 



(d) introducing said population of combined 
vectors into a compatible host under 
conditions sufficient for expressing 
said population of first and second DNA 

3 0 sequences; and 

(e) determining the heteromeric receptors 
which bind to said preselected 
molecule. 
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45. The method of claim 44, wherein said fix-st 
and second vectors are circular. 

46. The method of claim 44, wherein said 
heteromeric receptors are selected from the group 
consisting of antibodies, T cell receptors, integrins, 
hormone receptors and transmitter receptors. 

47. The method of claim 44, wherein said first 
and second DNA sequences encode functional portions of 
heteromeric receptors. 

48. The method of claim 47, wherein said first 
and second DNA sequences encode functional portions of the 
variable heavy and variable light chains of an antibody. 

49. The method of claim 44, wherein said 

expression of a diverse population of heteromeric receptors 
is on the surface of a cell. 

50. The method of claim 49, wherein said cell 
produces a filamentous bacteriophage. 

51. The method of claim 50, wherein said 
filamentous bacteriophage is selected from the group 
consisting of M13, fd and fl. 

52. The method of claim 51, wherein at least one 
of said first or second DNA sequences is expressed as a 
gene VIII fusion protein. 

53. The method of claim 44, wherein said two 
pairs of restriction sites are Hind III-Mlu I and Hind III- 
Mlu I. 
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54. The method of claim 44, wherein said 
combining step further comprises: 



(CI) restricting said first vector with a 
restriction enzyme recognizing one of 
the restriction sites encoded in said 
two pairs of restriction sites; 

(C2) restricting said second vector with a 
different restriction enzyme 
recognizing the second restriction 
site encoded in said two pairs of 
restriction sites; 



(C3) digesting the 3' ends of said 
restricted first and second vectors 
with an exonuclease ; and 

15 (CM) annealing said first and second 

vectors. 
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55. A method for determining the nucleic acid 
sequences encoding a heteromeric receptor exhibiting 
binding activity toward a preselected molecule from a 
diverse population of heteromeric receptors, comprising: 



5 (a) operationally linking to a first vector 

a first population of diverse DNA 
sequences encoding a diverse population 
of first polypeptides, said first 
vector having two pairs of restriction 
10 sites symmetrically oriented about a 

cloning site; 



(b) operationally linking to a second 
vector a second population of diverse 
DNA sequences encoding a diverse 

15 population of second polypeptides, said 

second vector having two pairs of 
restriction sites symmetrically 
oriented about a cloning site in an 
identical orientation to that of the 

20 first vector; 



(c) combining the vector products of step 
(a) and (b) under conditions which 
allow only the operational combination 
of vector sequences containing said 
25 first and second DNA sequences. 



(d) introducing said population of combined 
vectors into a compatible host under 
conditions sufficient for expressing 
said population of first and second DNA 
sequences ; 
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(e) determining the heteromeric receptors 
which bind to said preselected 
molecule; 



(f) isolating the nucleic acid sequences 
5 encoding said first and second 

polypeptides; and 



(g) sequencing said nucleic acid sequences. 



56. The method of claim 55, wherein said first 
and second vectors are circular. 



57. The method of claim 55, wherein said first 
heteromeric receptors selected from the group consisting of 
antibodies, T cell receptors, integrins, hormone receptors 
and transmitter receptors. 



58. The method of claim 55, wherein said first 
and second DNA sequences encode functional portions of 
heteromeric receptors. 

59. The method of claim 58, wherein said first 
and second DNA sequences encode functional portions of the 
variable heavy and variable light chains of an antibody. 

60. The method of claim 55, wherein said 
expression of a diverse population of heteromeric receptors 
is on the surface of a ceil filamentous bacteriophage 
selected from the group consisting of M13, fd and fl and at 

5 least one of said first or second DNA sequences is 
expressed as a gene VIII fusion protein. 

61. The method of claim 55, wherein said cell 
produces filamentous bacteriophage. 



706204 PCT/US91/07149 
91 

62. The method of claim 61, wherein said 

filamentous bacteriophage is selected from the group 
consisting of Ml 3 , fd and fl. 



63. The method of claim 62, wherein at least one 
of said frist or second DNA sequences is expressed as a 
gene VIII fusion protein. 

64. The method of claim 50, wherein said two 
pairs of restriction sites are Hind III-Mlu I and Kind III- 
Mlu I. 

65. The method of claim 50, wherein said 
combining step further comprises: 



(CI) restricting said first vector with a 
restriction enzyme recognizing one of 
the restriction sites encoded in said 
two pairs of restriction sites ; 

(C2) restricting said second vector with a 
different restriction enzyme 
recognizing the second restriction 
site encoded in said two pairs of 
restriction sites; 



(C3) digesting the 3* ends of said 
restricted first and second vectors 
with an exonuclease; and 



15 



(C4) 



annealing said first and second 
vectors . 
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65. A vector comprising two copies of a gene 
encoding a filamentous bacteriophage coat protein, one copy 
of said gene capable of being operationally linked to a DNA 
sequence encoding a polypeptide of a heteroiaeric receptor 
5 wherein said DNA sequence can be expressed as a fusion 
protein on the surface of said filamentous bacteriophage or 
as a soluble polypeptide. 

67. The vector of claim 66, wherein said two 
copies of said gene encode substantially the same amino 
acid sequence but have different nucleotide sequences. 

68. The vector of claim 66, wherein said one 
copy of said gene is expressed on the surface of said 
filamentous bacteriophage. 

69. The vector of claim 66, wherein said 
bacteriophage coat protein is M13 gene VIII. 

70. The vector of claim 66, wherein said vector 
has substantially the same sequence as that shown in Figure 

2 (SEQ ID NO: 1) , 

71. A vector comprising sequences necessary for 
the coexpression of two or more inserted DNA sequences 
encoding polypeptides which form heteromeric receptors and 
two copies of a gene encoding a filamentous bacteriophage 

5 coat protein, one copy of said gene capable of being 
operationally linked to one of said two or more inserted 
DNA sequences wherein said DNA sequence can be expressed as 
a fusion protein on the surface of said filamentous 
bacteriophage or as a soluble polypeptide. 

72. The vector of claim 71, wherein said two 
copies of said gene encode substantially the same amino 
acid sequence but have different nucleotide sequences. 
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73. The vector of claim 71, wherein said one 
copy of said gene is expressed on the surface of said 
filamentous bacteriophage. 

74. The vector of claim 71, wherein said 
bacteriophage coat protein is M13 gene VIII. 

75. The vector of claim 71, wherein said vector 
has substantially the same sequence as that shown in Figure 
6 (SEQ ID NO: 5) . 
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! 10 I 20 ! 30 I 40 ! 50 I 60 

1 AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 

61 ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 



121 CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCG; 



181 GTTGCATATT TAAAACA; 



'GT IGAGCTACAG CACCAGATTC AGCAATTAAG CTC 



ACTTTA 180 
AAGCCA 240 

GA.CCTG 300 

TCT GGTTCGCTTT ( \ AG TC A FTAAA CGCG ^TATTTGAA 60 
361 TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT 6CAATCCGCT TTGCTTCTGA CTA"AATAGT 420 
421 CAGGGTAAA6 ACCTGATTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTT T AAAGCA 480 
481 TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 



241 TCTGCAAAAA TGACCTdTA TCAAAA6GAG CAATTAAAGG TACTCTCTAA TCC 
301 TTGGAGTTTG CTTCCGG ~~ " ' 



G CAAAAGCCTC TCGCTATTTT 600 
G TTGCTCTTAC TATGCCTCGT 660 
G GTATTCCTAA ATCTCAACTG 720 
C GTTTTATTAA CGTAGATTTT 780 
A AAATCGCATA AGGTAATTCA 840 
T TACTACTCGT TCT66TGTTT 900 



541 AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTi 
601 GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTl 
661 AATTCCTTTT GGCGTTAT6T ATCTGCATTA GTTGAATG T i 
721 ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTf 
781 TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTI 

841 CAATGATTAA AGTTGAAATT AAACCATCTC AA6CCCAAT". 

901 CTCGTCAGGG CAAGCCTTAT TCACT6AATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 
961 AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCA.GCCTAT GCGCCTGGTC 1020 
1021 TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 
1081 GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTA I 1140 
1141 CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGC6C TTGGTATAAT CGCTGGGGGT 1200 
1201 CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 
1261 GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 
"321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 
381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCG6TTA 1440 
441 TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 



_501 ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 



98_ AACCCCGCTA ATCCTAA' 



TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 
TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 
TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 



CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 



TGGGTTCCIA TTGGGCT' 
TCTGAGGGTG GCGGTTl 
ATTCCGGGCT ATACTTA" 



560 
620 
680 
740 



GC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 1860 
GA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 
AT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 

CC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 2040 

2041 CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 
2101 CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 
2161 TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 
222. GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 
2281 GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 2340 
2341 GGCGGTTCTG AG6GTGGCGG CTCT6AGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 2400 
2401 GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 
2461 GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 2520 
521 GCTGCTATI I TTTCAT TGGTGACGTT TCCGGl I7G CTAATGGTAA TGGTGCTAC1 581 
2581 GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 264u 
2641 "AATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 2700 
270. TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCT.ATTG ATTGTGACAA AATAAACTTA 2760 
2761 TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTA Uur r G 1 ATTJ r:~ACG 2820 
2821 TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTTTG GGTATTCCGl 288U 
2881 ""ATTATTGCG TTTCCTCG6T TTCCTTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC 2940 
2941 TAAAAAGGG CTTCGGT.AAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG 3000 
3001 GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCT.GACT 3060 
3061 TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTAT I C 5120 
3121 TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG 3180 
3181 ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG 3240 
3241 CTCGTTAGCG TTGGTAAGAT TCAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 
3301 CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360 
3361 CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 3420 
3421 TCCTACGATG AAAATAAAAA CGGCTTGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAAT 3480 
3481 Af.f.CGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 3540 
3541 AAATTAGGAT GGGATATTAT TTTTCTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGCG 3600 
3601 CGTTCT6CAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 
3651 TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 
3721 GTTGGCGTTG TTAA.ATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 
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3781 ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 
3841 TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 
3901 AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 3960 
3961 TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 4020 
4021 GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 4080 
4081 CAGCGTCTTA AiCiAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 4140 
4141 AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 4200 
4201 ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTTT TCTTGATGTT 4260 
4261 TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCbll 1 1 iblbLbAi u 4^u 
4321 TGTAACTTGG TATTCAAAGC AATCAGGCGA ATCCGTTATT GITTCiCCCG ATGTAAAAGG 4380 
4381 TACTGTTACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTC 4440 

r ~-r TCCATAA jjcaGAAGTA 4500 
GATAATC AGGAATATGA 4560 
'6ATAATG TTACTCAAAC 4620 
GTCGAA.T TGTTTGTAAA 4680 



456 
462. 
468 



5io: 
5ie; 

522 
528. 
534: 
540: 



4441 TGTTTTACGT GCTAATAATT TTGATATGGT TGGTTCAATT CC" 
4501 TAATCCAAAC AATCAGGATT ATATTGATGA ATTGCCAlCA IL 
— TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA A A" 

TTTTAAAATT AATAACGTTC GGGCAAAGGA TTTAATACGA GT' .... .... . t _ rir 

. » x GTCTAATACT TCTAAATCCT CAAATGTATT ATCTATTGAt G rCTAATC r^FfGITGI 4740 
4741 TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTb MbAlllbCC qguu 
4801 AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGA6GTT CAGCAAGGTG ATGCTTTAGA 4860 
4861 TTTTTCATTT GCTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG 4920 
4921 CCTCACCTCT GTTTTATCTT CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGTTTT 4980 
4981 AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCCACG 504U 
5041 TATTCTTACG CTTTCAGGTC AGAAGG6TTC TATCTCTGTT GGCCAGAAT6 TCCCTTTTAT 5100 

ACTGGTCGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCG 5160 

TAAAATfiTA GGTATTTCCA TGAGCGTTTT TCCTGTTGCA ATGGCTGGCG GTAATATTGT 5220 
CTGGATATT ACCAGCAAGG CCGATAGTTT GAGTTCTTCT ACTCAGGCAA GTGATGTTAT 5280 
ACTAATCAA AGAAGTATTG CTACAACGGT TAATTTGCGT GATGGACAGA CTCTTTTACT 5340 
.GGTCGCCTC ACTGATTATA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 5400 
.... AATf.r.r.TTTA ATCGGCCTCC TGTTTAGCTC CCGCTCTGAT TCCAACGAGG AAAGCACGTT 5460 
5461 ATACGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA AGCGCGGCGG 5520 
5521 GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT 5580 
5581 TCGCTTTCT1 CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC 5640 
5641 GGGGGCTCCC TTTAGGGTTC CGATTTAGTG CTTTACGGCA CCTCGACCCC AAAAAACTTG 5700 
5701 ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 5760 
5761 C6TTGGAGTC CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA ACACTCAACC 5820 
582i CTATCTCGGG CTATTCITTT GAT1TATAAG GGATTTTGCC GATTTCGGAA CCACCATCAA 5880 
588. ACAGGATTTT CGCCTGCTGG GGCAAACCAG CGTGGACCGC TTGCTGCAAC TCTCTCAGGG 5940 
5941 CCAGGCGGTG AAGGGTm ~l GCTGT CC CGTCTi TG r n hAGAA AAACCACCCT 6000 
6001 GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGC6TTGGCC GATTCATTAA TGCAbCi^bu dudu 
6061 ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 6120 
6121 TCACTCATTA GGCACCCC.AG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 6180 
6181 TTGTGAGCGG ATAACAATTT CACACGCGTC ACTTGGCACT GGCCGTCGTT TTACAACGTC 6240 
6241 GTGACTGGGA AAACCCTGGC GTTACCCAAG CTTTGTACAT G6AGAAAA I A AAblbfiAAU wuu 
6301 AAGCACTATT GCACTGGCAC TCTTACCGTT ACCGTJACTG TTTACCCCTG TGACAAAAGC 6360 
6361 CGCCCAGGTC CAGCTGCTCG AGTCAGGCCT ATTGTGCCCA GGGGATTGTA CTAGTGGATC 6420 
6421 CTAGGCTGAA GGCGATGACC CTGCTAAGGC TGCATTCAAT AGTTTACAGG CAAGTGCTAC 6480 
6481 TGAGTACATT GGCTACGCTT GGGCTATGGT AGTAGTTATA GTTGGTGC IA ttAIAbbbAI 
6541 TAAATTATTC AAAAAGTTTA CGAGCAAGGC TTCTTAAGCA ATAGCGAAGA GGCCCGCACC 6600 
6601 GATCGCCCTT CCCAACAGTT GCGCAGCCTG AATGGCGAAT GGCGCTTTGC CTGGTTTCCG 6660 
6661 GCACCAGAAG CGGTGCCGGA AAGCTGGCTG GAGTGCGATC TTCCTGAGGC CGATACGGTC 6720 
6721 GTCGTCCCCT CAAACTGGCA GATGCACGGT TACGATGCGC CCATCTACAC CAACGTAACC 6/«U 
6781 TATCCCATTA CGGTCAATCC GCCGTTTGTT CCCACG6AGA ATCCGACGGG TTGTTACTCG 6840 
6841 CTCACATTTA ATGTTGATGA AAGCTGGCTA CAGGAAGGCC AGACGCGAAT TATTTTTGAT 6900 
fiqm GGCGTTCCTA TTGGTTAAAA AATGAGCTGA TTTAACAAAA ATTTAACGCG AATTTTAACA 6960 
6961 AAATATTAAC GTTTACAATT TAAATATTTG CTTATACAAT CTTCCTGTTT TTGGGGCTTT 7020 
7021 TCTGATTATC AACCGGGGTA CATATGATTG ACATGCTAGT TTTACGATTA CCGTTCATCG 7080 
7081 ATTCTCTTGT TTGCTCCAGA CTCTCAGGCA ATGACCTGAT AGCCTTTGTA GA ! CTCTCAA 7140 
7141 AA.ATAGCTAC CCTCTCCGGC ATTAATTTAT CAGCTAGAAC GGTTGAATAT CATATTGATG 7200 
7201 GTGATTTGAC TGTCTCCGGC CTTTCTCACC CTTTTGAATC TJI^CTACA CATTACTCAG 7260 
7261 GCATT6CATT TAAAATATAT GAGGGTTCTA AAAATTTTTA TCCTTGCGTT GAAATAAAGG 7320 
7321 CTTCTCCCGC AAAAGTATTA CAGGGTCATA ATGTTTTTGG TACAACCGAT TTAGCTTTAT 7380 
7381 GCTCTGAGGC TTTATTGCTT AATTTTGCTA ATTCTTTGCC TTGCCTGTAT GATTTATTGG 7440 
7Uli] ACGTT 7445 
I 10 I 20 i 30 ! 40 i 50 i 60 
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! 10 
1 AATGCTACTA 
61 ATAGCTAAAC 
121 CGTTCGCAGA 
181 GTTGCATATT 
241 TCCGCAAAAA 
301 TTGGAGTTTG 
361 TCTTTCGGGC 
421 CAGGGTAAAG 
481 TTTGAGGGGG 
541 AAACATTTTA 
601 GGTTTTTATC 
661 AATTCCTTTT 
721 ATGAATCTTT 
781 TCTTCCCAAC 
841 CAATGATTAA 
901 CTCGTCAGGG 
961 AA.TATCCGGT 
1021 TGTACACCGT 
1081 GTCTGCGCCT 
1141 CAGGCGAT6A 
1201 CAAAGATGAG 
1261 GTGGCATTAC 
1321 CAAAGCCTCT 
1381 CGATCCCGCA 
1441 TGCGTGGGCG 
1501 ATTCACCTCG 
1561 TTTTTGGAGA 
1621 TATTCTCACT 
1681 TTTACTAACG 
1741 CTGTGGAATG 
1801 TGGGTTCCTA 
1861 TCTGAGGGTG 
1921 ATTCCGGGCT 
1981 AACCCCGCTA 
2041 CAGAATAATA 
2101 CAAGGCACTG 
2161 TATGACGCTT 
2221 GATCCATTCG 
2281 GCTGGCGGCG 
2341 GGCGGTTCTG 
2401 GATTTTGAII 
2461 GAAAACGCGC 
2521 GCTGCTATCG 
2581 GGTGATTTTG 
2641 TTAATGAATA 
2701 TTTGTCTTTA 
2761 TTCCGTGGTG 
2821 TTTGCTAACA 
2881 TATTATTGCG 
2941 TTAAAAAGGG 
3001 GGCTTAACTC 
3061 TTGTTCAGGG 
3121 TCTCTGTAAA 
3181 ATTGGGA.TAA 
3241 CTCGTTAGCG 
3301 CTTGATTTAA 
3361 CTTAGAATAC 
3421 TCCTACGATG 
3481 ACCCGTTCTT 
3541 AAATTAGGAT 
3601 CGTTCTGCAT 
3661 TTTGTCGGTA 
3721 GTTGGCGTTG 
3781 ACTGGTAAGA 



I 20 
CTATTAGTAG 
AGGTTATTGA 
ATTGGGAATC 
TAAAACATGT 
TGACCTCTTA 
CTTCCGGTCT 
TTCCTCTTAA 
ACCTGATTTT 
ATTCAATGAA 
CTATTACCCC 
GTCGTCTGGT 
GGCGTTATGT 
CTACCTGTAA 
GTCCTGACTG 
AGTTGAAATT 
CAAGCCTTAT 
TCTTGTCAAG 
TCATCTGTCC 
CGTTCCGGCT 
TACAAATCTC 
TGTTTTAGTG 
GTATTTTACC 
GTAGCCGTTG 
AAAGCGGCCT 
ATGGTTGTTG 
AAAGCAAGCT 
TTTTCAACGT 
CCGCTGAAAC 
TCTGGAAAGA 
CTACAGGCGT 
TTGGGCTTGC 
GCGGTTCTGA 
ATACTTATAT 
ATCCTAATCC 
GGTTCCGAAA 
ACCCCGTTAA 
ACTGGAACGG 
TTTGTGAATA 
GCTCTGGTGG 
AGGGTGGCGG 
ATGAAAAGAT 
TACAGTCTGA 
ATGGTTTCAT 
CTGGCTCTAA 
ATTTCCGTCA 
GCGCTGGTAA 
TCTTTGCGTT 
TACTGCGTAA 
TTTCCTCGGT 
CTTCGGTAAG 
AATTCTTGTG 
TGTTCAGTTA 
GGCTGCTATT 
ATAATATGGC 
TTGGTAAGAT 



GGCT" 
CG6A" 
AAAA' 
GGAA" 
GGGA" 
TACG' 



! 30 
AATTGATGCC 
CCATTTGCGA 
AACTGTTACA 
TGAGCTACAG 
TCAAAAGGAG 
GGTTCGCTTT 
TCTTTTTGAT 
TGATTTATGG 
TATTTATGAC 
CTCTGGCAAA 
AAACGAGGGT 
ATCTGCATTA 
TAATGTTGTT 
GTATAATGAG 
AAACCATCTC 
TCACIGAATG 
ATTACTCTTG 
TCTTTCAAAG 
AAGTAACATG 
CGTTGTACTT 
TATTCTTTCG 
CGTTTAATGG 
CTACCCTCGT 
TTAACTCCCT 
TCATTGTCGG 
GATAAACCGA 
GAAAAAATTA 
TGTTGAAAGT 
CGACAAAACT 
TGTAGTTTGT 
TATCCCTGAA 
GGGTGGCGGT 
CAACCCTCTC 
TTCTCTTGAG 
TAGGCAGGGG 
AACTTATTAC 
TAAATTCAGA 
TCAAGGCCAA 
TGGTTCTGGT 
CTCTGAGGGA 
GGCAAACGCT 
CGCTAAAGGC 
TGGTGACGTT 
TTCCCAAATG 
ATATTTACCT 
ACCATATGAA 
TCTTTTATAT 
TAAGGAGTCT 
TTCCTTCTGG 
ATAGCTATTG 
GGTTATCTCT 
ATTCTCCCGT 
TTCATiTITG 
TGTTTATTTT 
TCAGGATAAA 
CCTCCCGCAA 
TTCTATATCT 
CGGCTTGCTT 
GGAAAGACAG 



! 40 
ACCTTTTCAG 
AATGTATCTA 
TGGAATGAAA 
CACCAGATTC 
CAATTAAAGG 
GAAGCTCGAA 
GCAATCC6CT 
TCATTCTCGT 
GATTCCGCAG 
ACTTCTTTTG 
TATGATAGTG 
GTTGAATGTG 
CCGTTAGTTC 
CCAGTTCTTA 
AAGCCCAATT 
AGCAGCTTTG 
ATGAAGGTCA 
TTGGTCAGTT 
GAGCAGGTC6 
TGTTTCGCGC 
CCTCTTTCGT 
AAACTTCCTC 
TCCGATGCTG 
GCAAGCCTCA 
CGCAACTATC 
TACAATTAAA 
TTATTCGCAA 
TG7TTAGCAA 

TTflGATCGTT 

ACTGGTGACG 
AATGAGGGTG 
ACTAAACCTC 
GACGGCACTT 



•CTCAGC 

TAACTG 
'ACACTC 
GCGCTT 
CTGACC 
GGCGGCTCTG 
6GCGGTTCCG 
AATAAGGGGG 
AAACTTGATT 
TCCGGCCTTG 
GCTCAAGTCG 
TCCCTCCCTC 
TTTTCTATTG 
GTTGCCACCT 
TAATCATGCC 
TAACTTTGTT 
CTATTTCATT 



CTGA' 

ctaa; 



ATT A 6 
GCGCT 



CAAAA 
"AAGCC 
AAAA/ 

'GATAA ._ .. 
•ATTAT TTTTCr 
'GAACA TGTTGT^TAT 



CTTTATATTC 
TTAAATATGG 
ATTTGTATAA 



TCTTAT" 
CGATTC; 



Gil 



ACGVAAACA 
GTAACTGGCA 
ATTGTAGCTG 
GTCGG6AGGT 
GATTTGCTTG 
GTTCTCGAT6 
CCGATTATTG 
CAGGACTTAT 
TGTCGTCGTC 
GGCTCGAAAA 
TTAAGCCCTA 
ACTAAACAGG 



! 50 
CTCGCGCCCC 
ATGGTCAAAC 
CTTCCAGACA 
AGCAATTAAG 
TACTCTCTAA 
TTAAAACGCG 
TTGCTTCTGA 
TTTCTGAACT 
TATTGGACGC 
CAAAAGCCTC 
TTGCTCTTAC 
GTATTCCTAA 
GTTTTATTAA 
AAATCGCATA 
TACTACTCGT 
TTACGTTGAT 
GCCAGCCTAT 
CGGTTCCCTT 
CG6ATTTCGA 
TTGGTATAAT 
TTTAGGTTGG 
ATGAAAAAGT 
TCTTTCGCTG 
GCGACCGAAT 
GGTATCAAGC 
GGCTCCTTTT 
TTCCTTTAGT 
AACCCCATAC 
Af GfTAArTA 
AAACTCAGTG 
GTGGCTCTGA 
CTGAGTACGG 
ATCCGCCTGG 
CTCTTAATAC 
TTTATACGGG 
CTGTATCATC 
TCCATTCTGG 
TGCCTCAACC 
AGGGTGGTGG 
GTGGTGGCTC 
CTATGACCGA 
CTGTCGCTAC 
CTAATGGTAA 
GTGACGGTGA 
AATCGGTTGA 
ATTGTGACAA 
TTATGTATGT 
AGTTCTTTTG 
CGGCTATCTG 
GTTTCTTGCT 
CGCTCAATTA 
TCCCTGTTTT 
AAAAATCGTT 
AATTAGGCTC 
GGTGCAAAAT 
TCGCTAAAAC 
CTATTGGGCG 
AGTGCGGTAC 
ATTGGTTTCT 
CTATTGTTGA 
TGGACAGAAT 
TGCCTCTGCC 
CTGTTGAGCG 
CTTTTTCTAG 



! 60 
AAATGAAAAT 60 

TAAATCTACT 120 



■ACTTTA 180 
_ . . 'A AG CCA 240 
TCCTGACCTG 300 
ATATTTGAAG 360 
CTA T AATAGT 420 
GTTTAAAGCA 480 
TATCCAGTCT 540 
TCGCTATTTT 600 
TATGCCTCGT 660 
ATCTCAACTG 720 
CGTAGATTTT 780 
AGGTAATTCA 840 



TCTGG" 
TTGGG' 
GCGCC" 
ATGAT 



GTTT 900 
AATG 960 
GGTC 1020 
GACC 1080 
CACAATTTAT 1140 
CGCTGGGGGT 1200 
TGCCTTCGTA 1260 
CTTTAGTCCT 1320 
CTGAGGGTGA 1380 
ATATCGGTTA 1440 
TGTTTAAGAA 1500 
GGAGCCTTTT 1560 
I GTTCCTTTC 1620 
AGAAAATTCA 1580 
TGAGGGTTGT 1740 
TTACGGTACA 1800 
GGGTGGCGGT I860 
TGATACACCT 1920 
TACTGAGCAA 1980 
TTTCATGTTT 2040 
CACTGTTACT 2100 
AAAAGCCATG 2160 
CTTTAATGAA 2220 
TCCTGTCAAT 2280 
CTCTGAGGGT 2340 
TGGTTCCGGT 2400 
AAATGCCGAT 2460 
TGATTACGGT 2520 
TGGTGCTACT 2580 
TAATTCACCT 2640 
ATGTCGCCCT 2700 
AATAAACTTA 2760 
ATTTTCTACG 2820 
GGTATTCCGT 2880 
CTTACTTTTC 2940 
CTTATTATTG 3000 
CCCTCTGACT 3060 
TATGTTATTC 3120 
TCTTATTTGG 3180 
TGGAAAGACG 3240 
AGCAACTAAT 3300 
GCCTCGCGTT 3360 
CGGTAATGAT 3420 
TTGGTTTAAT 3480 
ACATGCTCGT 3540 
TAAACAGGCG 3600 
TACTTTACCT 3660 
TAAATTACAT 3720 
TTGGCTTTAT 3780 
TAATTATGAT 3840 
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3841 TCCGGTGTTT 
3901 AATTTAGGTC 
3961 TGTCTTGCGA 
4021 GAG6TTAAAA 
4081 CAGCGTCTTA 
4141 AGCGACGATT 
4201 ATTAAAAAAG 



4261 
4321 
4381 
4441 



TGTTTCATCA 
"GTAACTTGG 
"ACTGTTACT 
. . ._ "GTTTTACGT 
4501 TAATCCAAAC 
4561 TGATAATTCC 
4621 TTTTAAAATT 
4681 GTCTAATACT 
4741 TAGTGCACCT 
4801 AACTGACCAG 
4861 TTTTTCATTT 
4921 CCTCACCTCT 
4981 AGGGCTATCA 



504. 
510. 
516. 
522' 



TATTCT 
TACTGG' 
TCAAAA' 
TCTGGA' 



5281 TACTAA 
534." ' 
540' 
546: 
552 
558 



'ACG 
CGT 
GTA 
ATT 
CAA 

CGGTGGCCTC 
AATCCCTTTA 
ATACGTGCTC 
GTGTGGTGGT 

TCGCTTTCTT 

5641 GGGGGCTCCC 
5701 ATTTGGGTGA 
5761 CGTTGGAGTC 
5821 CTATCTCGGG 
5881 ACAGGATTTT 
5941 CCAGGCGGTG 
6001 GGCGCCCAAT 
6061 ACGACAGGTT 
6121 TCACTCATTA 
6181 TTGTGAGCGG 
6241 TACGGCAGCC 
6301 GACCCAGACT 
6361 CTGGCCGTCG 
6421 CCTTGCAGAA 
6481 TTCCCAACAG 
5541 AGCGGTGCCG 
6601 CTCAAACTGG 
6661 TACGGTCAAT 
6721 TAATGTTGAT 
6781 TATTGGTTAA 
6841 ACGTTTACAA 
6901 TCAACCGGGG 
6961 GTTTGCTCCA 
7021 ACCCTCTCCG 
7081 ACTGTCTCCG 
7141 TTTAAAATAT 
7201 GCAAAAGTAT 
7261 GCTTTATTGC 
! 10 



ATTCTTATTT 
AGAAGATGAA 
TTGGATTTGC 
AGGTAGTCTC 
ATCTAAGCTA 
TACAGAAGCA 
GTAATTCAAA 
TCTTCTTTTG 
TATTCAAAGC 
GTATATTCAT 
GCTAATAATT 
AATCAGGATT 
GCTCCTTCTG 
AATAACGTTC 
TCTAAATCCT 
AAAGATATTT 
ATATTGATTG 
GCTGCTGGCT 
:"TTTATCTT 
GTTCGCGCAT 



AACGCCTTAT 
GCTTACTAAA 
ATCAGCATTT 
TCAGACCTAT 
TCGCTATGTT 
AGGTTATTCA 
TGAAATTGTT 
CTCAGGTAAT 
AATCAGGCG/" 



TTATCACACG 
ATATATTTGA 



TTCAGGTC 
. GACTGGTG 
GGTATTTCCA 
ACCAGCAAGG 
AGAAGTATTG 
ACTGATTATA 
ATCGGCCTCC 
GTCAAAGCAA 
TACGCGCAGC 
CCCTTCCTTT 
TTTAGGGTTC 
TGGTTCACGT 
CACGTTCTTT 
CTATTCTTTT 
CGCCTGCTGG 
AAGGGCAATC 
ACGCAAACCG 
TCCCGACTGG 
GGCACCCCAG 
ATAACAATTT 
GCTGGATTGT 
CCAGATATCC 
TTTTACAACG 
TTCCCTTTCG 
TTGCGCAGCC 
6AAAGCTGGC 
CAGATGCACG 
CCGCCGTTTG 
GAAAGCTGGC 
AAAATGAGCT 
TTTAAATATT 
TACATATGAT 
GACTCTCAGG 
GCATTAATTT 
GCCTTTCTCA 
ATGAGGGTTC 
TACAGGGTCA 
TTAATTTTGC 
j 20 



'GACGTTAA 
'GATATGGT 
ATTGATGA 
_ GGTTTCTT 
GGGCAAAGGA 
CAAATGTATI 
TAGATAACCT 
AGGGTTTGAT 
CTCAGCGTGG 
CTGCTGGTGG 
TAAAGACTAA 
AGAAGGGTTC 
AATCTGCCAA 
TGAGCGTTTT 
CCGATAGTTT 
CTACAACGGT 
AAAACACTTC 
TGTTTAGCTC 
CCATAGTACG 
GTGACCGCTA 
rjrr.rrnrr.j 

CGATTTAGTG 
AGTGGGCCAT 
AATAGTGGAC 
GATTTATAAG 
GGCAAACCAG 
AGCTGTTGCC 
CCTCTCCCCG 
AAAGCGGGCA 
GCTTTACACT 
CACACGCCAA 
TATTACTCGC 
AACAGGAATG 
TCGTGACTGG 
CCAGCTGGCG 
TGATTGGCGA 
TGGAGTGCGA 
GTTACGAT6C 
TTCCCACGGA 
TACAGGAAGG 
GATTTAACAA 
TGCTTATACA 
TGACATGCTA 
CAATGACCTG 
ATCAGCTA6A 
CCCTTTGAA 
TAAAAATTTT 
TAATGTTTTT 
TAATTCTTTG 
! 30 



ACATATAG' 
GATTTTGA- 
TTCAAGGA' 
CTCACATA' 
AAATGTAATT 
TGAAAT6AAT 
ATCCGTTATT 
ACCTGAAAAT 
TGGTTCAATT 
ATTGCCATCA 
TGTTCCGCAA 
TTTAATACGA 



'ATT G.AC 
CAATTC 
GAGGTT 

GTTGCA 

TTCGTTCGGT 
TAGCCATTCA 
TATCTCTGTT 
TGTAAATAAT 
TCCTGTTGCA 
GAGTTCTTCT 
TAATTTGCGT 
TCAAGATTCT 
CCGCTCTGAT 
CGCCCTGTAG 
CACTTGCCAG 
TCGCCGGCTT 
CTTTACGGCA 
CGCCCTGATA 
TCTTGTTCCA 
GGATTTTGCC 
CGTGGACCGC 
CGTCTCGCTG 
CGCGTTG6CC 
GTGAGCGCAA 
TTATGCTTCC 
GGAGACAGTC 
TGCCCAACCA 
AGTGTTAATT 
GAAAACCCTG 
TAATAGCGAA 
ATGGCGCTTT 
TCTTCCTGAG 
GCCCATCTAC 
GAATCCGACG 
CCAGACGCGA 
AAATTTAACG 
ATCTTCCTGT 
GTTTTACGAT 
ATAGCCTTTG 
ACGGTTGAAT 
TCTTTACCTA 
TATCCTTGCG 
GGTACAACCG 
CCTTGCCTGT 
! 40 



GTCGGTATTT 
AAAAGTTTTC 
ATATAACCCA 
AATTCACTAT 
CTAAGGGAAA 
TTGATTTATG 
AATTTTGTTT 
AATTCGCCTC 
GTTTCTCCCG 
CTACGCAATT 
CCTTCCATAA 
TCTGATAATC 
AATGATAATG 
GTTGTCGAAT 
GGCTCTAATC 
CTTTCTACTG 
CAGCAAGGTG 
GGCGGTGTTA 
ATTTTTAATG 
AAAAlAl IGI 
GGCCAGAATG 
CCATTTCAGA 
ATGGCTGGCG 
ACTCAGGCAA 
GATGGACAGA 
GGCGTACCGT 
TCCAACGAGG 
CGGCGCATTA 
CGCCCTAGCG 
TCCCCGTCA.A 
CCTCGACCCC 
GACGGTTTTT 
AACTGGAACA 
GATTTCGGA.A 
TTGCTGCAAC 
GTGAAAAGAA 
GATTCATTAA 
CGCAATTAAT 
GGCTCGTATG 
ATAATGAAAT 
GCCATGGCCG 
CTAGAACGCG 
GCGTTACCCA 
GAGGCCCGCA 
GCCTGGTTTC 
GCCGATACGG 
ACCAACGTAA 
GGTTGTTACT 
ATTATTTTTG 
CGAATTTTAA 
TTTTGGGGCT 
TACCGTTCAT 
TAGATCTCTC 
ATCATATTGA 
CACATTACTC 
TTGAAATAAA 
ATTTAGCTTT 
ATGATTTATT 
: 50 



CAAACCATTA 3900 
ACGCGTTCTT 3960 
ACCTAAGCCG 4020 
TGACTCTTCT 4080 
ATTAATTAAT 4140 
TACTGTTTCC 4200 
TCTTGATGTT 4260 
TGCGCGATTT 4320 
ATGTAAAAGG 4380 
TCTTTATTTC 4440 
TTCAGAAGTA 4500 
AGGAATATGA 4560 
TTACTCAAAC 4620 
TGTTTGTAAA 4680 
TATTAGTTGT 4740 
TTGATTTGCC 4800 
ATGCTTTAGA 4860 
ATACTGACCG 4920 
GCGATGTTTT 4980 
CTGTGCCACG 5040 
TCCCTTTTAT 5100 
CGATTGAGCG 5160 
GTAATATTGT 5220 
GTGATGTTAT 5280 
CTCTTTTACT 5340 
TCCTGTCTAA 5400 
AAAGCACGTT 5460 
AGCGCGGCGG 5520 
CCCGCTCCTT 5580 
GCTCTA.AA.TC 5640 
AAAAAACTTG 5700 
CGCCCTTTGA 5760 
ACACTCAACC 5820 
CCACCATCAA 5880 
TCTCTCAGGG 5940 
AAACCACCCT 6000 
TGCAGCTGGC 6050 
GTC-AGTTAGC 6120 
TTGTGTGGAA 6180 
ACCTATTGCC 6240 
AGCTCGTGAT 6300 
TCACTTGGCA 6360 
AGCTTAATCG 6420 
CCGATCGCCC 6480 
CGGCACCAGA 6540 
TCGTCGTCCC 6600 
CCTATCCCAT 6660 
CGCTCACATT 6720 
ATGGCGTTCC 6780 
CAAAATATTA 6840 
TTTCTGATTA 6900 
CG/ rrCTCTT 6960 
AAAAATAGCT 7020 
TGGTGATTTG 7080 
A.GGCATTGCA 7140 
GGCTTCTCCC 7200 
ATGCTCTGAG 7260 
GGATGTT 731" 
I 60 
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I 10 I 20 I 30 I 40 I 50 I 60 

i AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 

61 ATAGCTAAAC AGGTTATTGA CCA1TTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 



121 CGTTCGCAGA ATT GGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCG" 
181 GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTC" 



241 TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCC 
301 TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TAAAACGCG ATA' 
361 TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTA' 
421 CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTT 
481 TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 
541 AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 



'ACTTTA 180 
AAGCCA 240 
GACCTG 300 
TTGAAG 360 
AATAGT 420 
AAAGCA 480 



60 



661 AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 



261 GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 



GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 660 



ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 
TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 
CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 
CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 
AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 
TGTACACCGT TCATCTGTCC TCTTTCAAAG TTG6TCA6TT CGGTTCCCTT ATGATTGACC 1080 
GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 
CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 
CAAAGATGA TGTTTTAGTG T r TTTTTi G T fTTCGl TTTAGGTTGI TGCC1 GTA 121 



CAAAGCCTCT GTAGCCCTTG CTACCCTCGT TCCGATGCTG TCTTTCCCTG CTGAGGGTGA 1380 

CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 1440 

rGCGTGGGCG ATC-GT7GTTG TCATTGTCGG CGCAACTA7C GGTATCAAC-C T6TTTAAGAA 1500 

ATTCACCTCG AAAGCAAGCT GATAAACCGA i ACAA1 I AAA GGCTCCTTTT GGAGCCTTTT 1560 

TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1520 

TATTCTCACT CCGCTGAAAC TGTTGAA.AGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 

TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGG T TGT 1740 

CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 



TGGGTTCCTA TTGGGCT" 
TCTGAGGGTG GCGGTTC" 
ATTCCGGGCT ATACTTA' 
AACCCCGCTA ATCCTAA 



"GC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 1860 

"GA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 

AT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 

.... CC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 2040 

2041 CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 

2101 CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 

2161 TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 

2221 GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 

2281 GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 2340 

07.ni aaraayrnr, flGGGTGGrfir; rrrTGAGGGA aaraanrra ajp.r-,Taarjr rr,r,urrr-,r,T ?zinn 

2401 GATTTTGATT AT GAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAAT GCCGAT 2460 



AC TGATTACGGT 2520 
AA TGGTGCTACT 2580 
'GA TAATTCACCT 2640 
'GA ATGTCGCCCT 2700 



2461 GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCT, 
2521 GCTGCTATCG ATGGTTTCAT TGGTGAC6TT TCCGGCCTTG CTAATGGT 
2581 GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGG 1 "! 

2641 TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTl - 

2701 TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 2760 
2761 TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 2820 
2821 TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTTTG GGTATTCCGT 2880 
2881 TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT GCCGTATCTG CTTACTTTTC 2940 
2941 TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG 3000 
3001 GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3060 
3061 TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 
3121 TCTCTGTAAA GGCTGCTATT TTCATTTTTG AC6TTAAACA. AAAAATCGTT TCTTATTTGG 3180 



318 
324 
330! 
336.' 
342 
348. 
354: 
360! 



ATTGGGATAA ATAA" 
CTCGTTAGCG TTGG' 
CTTGATTTAA GGCT' 
CTTAGAATAC C-GGA" 
TCCTACGATG AAA A 
ACCCGTTCTT GGAA" 
AAATTAGGAT GGGA" 
CGTTCTGCAT TAGC* 



ATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG 3240 

AAGAT TCAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 

'CAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360 

AAGCC TTCTA.TATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 3420 

'AAAAA CGGCTTGCT" GTTCTCGATG AGTG GGT/T TTGGTTTAAT 3480 

GATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 3540 

ATTAT TTT T CTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGCG 3600 



G A.AC A. TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACT 



3661 TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACA' 
3721 GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTA" 
3781 ACTGGTAA6A ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGA' 



3660 
3720 
3780 
3840 
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38*11 TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 

3901 AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 3960 

3961 TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 4020 

402" ™ r.-r __™-r» ,,-r.n r^TP-r-rn „non 

408 
414 
420 
426 



GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 4080 
CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 4140 
AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 4200 
ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTTT TCTTGATGTT 4260 
^u- TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCCTC TGCGCGATTT 4320 
4321 TGTAACTTGG TATTCAAAGC AATCAGGCGA ATCCGTTATT GTTTCTCCCG ATGJAAAAGG 4380 
4381 TACTGTTACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTC 4440 



444 
450. 
456 



TGTTTTACGT GCTAATAA' 
TAATCCAAAC AATCAGGA" 
TGATAATTCC GCTCCTTC" 



4621 TTTTAAAATT AATAACGT" 



T TTGATATGGT TGGilCAATT CCTTCCATAA TTCAGAAGTA 4500 

T ATATTGATGA ATTGCCATCA TCTGATAATC AGGAATATGA 4560 

"G GTGGffTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 4620 

, , , linnn n,, ,.n,nn.v. "C GGGCAAAGGA TTTAATACGA GTTGTCGAAT TGTTTGTAAA 4680 

4681 GTCTAATACT TCTAAATCCT CAAATGTATT ATCTATTGAC GGCTCTAATC TATTAGTTGT 4740 

4741 TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 4800 

4801 AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTJTTAGA 4860 

4861 TTTTTCATTT GCTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACC6 4920 

4921 CCTCACCTCT GTTTTATCTT CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGTTTT 4980 

4981 AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCCACG 5040 

5041 TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAGAATG TCCCTTTTAT 5100 



5101 
5161 
5221 
5281 



ACTGG' 
CAAAA' 
CTGGA" 
ACTA A' 



CGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCG 51bU 
GTA GGTATTTCCA TGAGCGTTTT TCCT6TTGCA ATGGCTGGCG GTAAIATTGT 5220 
ATT ACCAGCAAGG CCGATAGTTT GAGTTCTTCT ACTCAGGCAA GTGAJGTTAT 5280 
. ... CAA AGAAGTATTG CTACAACGGT TAATTTGCGT GATGGACAGA CTCTTTTACT 5340 

5341 CGGTGGCCTC ACTGATTATA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 5400 
5401 AATCCCTTTA ATCGGCCTCC TGTTTAGCTC CCGCTCTGAT TCCAACGAGG AAAGCACGTT 5450 
5451 ATACGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA AGCGCGGCGG 5520 
Wl GTGTGGTGGT TAGGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT 5580 
5581 TCGCTTTCTT CCCTTCCTTT CTCGCCACGT TCGCCGGCTT JCCCCGTCAA GCTCTAAATC 5640 
5641 GGGG6CTCCC TTTAGGGTTC CGATTTAGTG CTTTACGGCA CCTCGACCCC AAAAAACTTG 5700 
5701 ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 5760 
R7fil ffiTTfififlfiTr CflnGTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA ACACTCAACC 5820 
5821 CTAtCfCGGG CTAT f GTTTT GATTTATAAG GGATTTTGCC GATTTCGGAA CCACCATCAA 5880 
5881 ACAGGATTTT CGCCTGCTG6 GGCAAACCAG CGTGGACCGC TTGCTGCAAC TCTCTCAGGG 5940 
5941 CCAGGCGGTG AAGGGCAATC AGCTGTTGCC CGTCTCGCTG GTGAAAAGAA AAACCACCCT 5000 
6001 GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 6060 
6061 ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 6120 
6121 TCACTCATTA GGCACCCCAG GCTTTACACT I I A ! bL l l LU bbUiUbiAib nt.HDibu.HH diou 
6181 TTGTGAGCGG ATAACAATTT CACACGCGTC ACTTGGCACT GGCCGTCGTT TTACAACGTC 5240 
6241 GTGACTGGGA AAACCCTGGC GTTACCCAAG CTTTGTAC.AT GGAGAAAATA AAGTGAAACA 6300 
6301 AAGCACTATT GCACTGGCAC TCTTACCCTT ACTGTTTACC CCTGTGGCAA ,'^(„r AGGJ 6360 
6361 CCAGCTGCTC GAGTCGGTCT TCCCCCTGGC ACCCTCCTCC AAGAGCACCT CTGGGGGCAC 6420 
6421 AGCGGCCCTG GGCTGCCTGG TCAAGACTAA TTCCCCGAAC CGGTGACGGT GTCGTGGAAC 6480 
6481 TCAGGCGCCC TGACCAGCGG CGTGCACACC TTCCCGGCTG TCCTACAGTC CTCAGGACTC 6540 
6541 TACTCCCTCA GCAGCGTGGT GACCGTGCCC TCCAGCAGCT TGGGCACCCA GACCTACATC 6600 
6601 TGCAACGTGA ATCACAAGCC CAGCAACACC AAGGTGGACA AGAAAGCAGA GCCCAAATU bbbU 
6661 TGTACTAGTG GATCCTACCC GTAC6ACGTT CCGGACTACG CTTCTTAGGC TGAAGGCGAT 6720 
6721 GACCCTGCTA AGGCTGCATT CAATAGTTTA CAGGCAAGTG CTACTGAGTA CATTGGCTAC 6780 
6781 GCTTGGGCTA TGGTAGTAGT TATAGTTGGT GCTACCATAG GGATTAAATT r'TirAA-AAG 6840 
6841 TTTACGAGCA AGGCTTCTTA AGCAATAGCG AAGAGGCCCG LALlbAILbO UUMULUAL byuu 
6901 AGTTGCGCAG CCTGAATGGC GAATGGCGCT TTGCCTGGTT TCCGGCACCA GAAGCGGTGC 6960 
696, CGGAAAGCTG GCTGGAGTGC GATCTTCCTG AGGCCGATAC GGTCGTCGTC CCCTCAAACT 7020 
7021 GGCAGATGCA CGGTTACGAT GCGCCCATCT ACACCAACGT AACCTATCCC AIIACGGTCA 7080 
7081 ATCCGCCGTT TGTTCCCACG GAGAATCCGA CGGGTTGTTA CTCGCTCACA TTTAATGTTG /14U 
7141 ATGAAAGCTG GCTACAGGAA GGCCAGACGC GAATTATTTT TGATGGCGTT CCTATTGGTT 7200 
7201 AAAAAATGAG CTGATTTAAC AAAAATTTAA CGCGAATTTT AACAAAATAT TAACGTTTAC 7260 
7261 AATTT A.A.AT A TTTGCTTATA CAATCTTCCT GTTTTTG6GG CTTTTCTGAT TATCAACCGG 7320 
7321 GGTACATATG ATTGACATGC TACTTTTACG ATTACCGTTC ATCGATTCTC TTGTTTGCTC 7380 
7381 CAGACTCTCA GGCAATGACC TGATAGCCTT TGTAGATCTC TCAAAAATAG CTACCCTCTC 7^40 
7441 CGGCATTAAT TTATCAGCTA GAACGGTTGA ATATCATATT GATGGTGATT TGACTGTCTC 7500 
7501 CGGCCTTTCT CACCCTTTTG AATCTTTACC TACACATTAC TCAGGCATTG CATTTAAAAT 7560 
7561 ATA p-AGSGT T TAAAAATT TTTATCCTTG CGTTGAAATA AAGGCTTCTC CCGCAAAAGT 7620 



7621 



, „^ ATTACAGGGT CATAATGTTT TTGGTACAAC CGATTTAGCT TTATGCTCTG AGGCTTTATT 7680 
7681 GCTTAATTTT GCTAATTCTT TGCCTTGCCT GTATGATTTA TTGGACGTT cn 7729 
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! 10 ! 20 ! 30 ! 40 ! 50 ! 60 
1 AATGCTACTA CTATTAGTA6 AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 
61 ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 
121 CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 
181 GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 
241 TCCGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA JCCTGACCTG 300 
301 TTGGAGTTTG CTTCCGGTT ( GTT( GCTTT lAAGCTCGAA TTAAAACGCG ATATTTGAAG 360 
361 TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTA'AATAGT 420 
421 CAGGGTAAAG ACCTGATITT TGATTTATGG TCATTCTCGT TTTCTGAACT GTT T AAAGCA 480 
481 TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 
541 AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGM' TCG TATTTT 600 
601 GGTTTTTATC GTCGTCTGGT AAACGAGGGT TAT6ATAGTG TTGCTCTTAC TATbLULbl bbU 
661 AATTCCTTTT GGCGTTATGT ATCTGCATTA GTT6AATGTG GTATTCCTAA ATCTCAACTG 720 
721 ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 
781 TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGJAATTCA 840 
841 CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TAG I AC I CGI I LI Gb I bill yyu 
901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 
961 AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 
1021 TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 
081 GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CALAAMIAI 114U 
141 CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 
201 CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 
?fit GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 
321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 
381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT 6CAAGCGTCA GCGACC6AAT ATATCGGTTA 1440 
441 TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 
,501 ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1560 
1561 TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 
'.621 TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC A6AAAATTCA 1680 
±681 TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 1740 
_741 CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 
1801 TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 1860 
-861 TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CT6AGTACGG TGATACACCT 1920 
,921 ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 
1981 AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 2040 
2041 CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTJATACGGG CACTGTTACT 2100 
2101 CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 
216 TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTFAATGAA 2220 
222 GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 
2281 GCTGGCGGCG GCTCTGGTGG TGGTTCI^i ] C ( tC r i AGGGTGGTGi I I ' iG^ 2340 
2341 GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGG TCCGGT 2400 
2401 GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAAJGCCGAT 2460 
2461 GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 2520 
2521 GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA IGGTGCTACT 2580 
2581 GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTbALbblbA IAA ILALU zmu 
2641 TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATG T CGCCCT 2700 
2701 TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 2760 
2761 TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCAC r TTATGTATGT ^Ti I_iACG 2820 



GCC AGTTCTTTTG GGTATTCCGT 2880 
GTT CGGCTATCTG CTTACTTTTC 2940 
CTT GCTCTTATTA TTGGGCTTAA 3000 
CAA TTACCCTCTG ACTTTGTTCA 3060 



2821 TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCAT 1 
2881 TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTT ' 
2941 TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CCTGTTT^ 
3001 CTCAATTCTT GTGGGTTATC TCTCTGATAT TAGCGCTL... . ... : . r=I ^ r 

3061 GGGTGTTCAG TTAATTCTCC CGTCTAATGC GCTTCCCTGT TTTTATG] I A N LI LI LI b I 51ZU 
3121 AAAGGCTGCT ATTTTCATTT TTGACGTTAA ACAAAAAATC GTTTCTTATT TGGATTGGGA 3180 
3181 TAAATAATAT GGCTGTTTAT TTTGTAACTG GCAAATTAGG CTCTGGAAAG ACGCTCGTTA 3240 
Gf.GTTGGTAA GATTCAGGAT AAAATTGTAG CTGGGTGCAA AATAGCAACT AATCTTGAT 5 00 
3301 TAAGGCTTCA AAACCTCCCG CAAGTCGGGA GGTTCGCTAA AACGCCTCGC GTTCTTAGAA |3bU 
3361 TACCGGATAA GCCTTCTATA TCTGATTTGC TTGCTATTGG GCGCGGTAAT GATTCCTACG 3420 
3421 ATGAAAATAA AAAC6GCTTG CTTGTTCTCG ATGAGTGCGG TACTTGGTTT AATACCCGTT 3480 
3481 CTTGGAATGA TAAGGAAAGA CAGCCGATTA TTGATTGGTT TCTACATGCT CGTAAATTAG 3540 



AAACAG GCGCGTTCTG 3600 
ACTTTA CCTTTTGTCG 3660 
AAATTA CATGTTGGCG 3720 



3541 GATGGGATAT TATTTTTCTT GTTCAGGACT TATCTATTGT TGA' 
3601 CATTAGCTGA ACATGTTGTT TATTGTCGTC GTCTGGACAG AAT 

3661 GTACTTTATA TTCTCTTATT ACTGGCTCGA AAATGCCTCT GCC 

3721 TTGTTAAATA TGGCGATTCT CAATTAAGCC GTACTGTTGA GCGTTGGCTT TATACTGGTA 3780 
3781 AGAATTTGTA TAACGCATAT GATACTAAAC AGGCTTTTTC TAGTAATTAT GATTCCGGTG 3840 
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3841 TTTATTCTTA 
3901 GTCAGAAGAT 
3961 CGATTGGATT 
4021 AAAAGGTAGT 
A081 TTAATCTAAG 
4141 ATTTACAGAA 
4201 AAGGTAATTC 
4261 TCATCTTCTT 
4321 TGGTATTCAA 
4381 ACTGTATATT 
4441 CGTGCTAATA 
4501 AACAATCAGG 
4561 TCCGCTCCTT 
4621 ATTAATAACG 
4681 ACTTCTAAAT 
4741 CCTAAAGATA 
4801 CAGATATTGA 
4861 TTTGCTGCTG 
4921 TCTGTTTTAT 
4981 TCAGTTCGCG 
5041 ACGCTTTCAG 
5101 CGTGTGACTG 
5161 GTAGGTATTT 
5221 ATTACCAGCA 
5281 CAAA6AAGTA 
5341 CTCACTGATT 
5401 TTAATCGGCC 
5461 CTCGTCAAAG 
5521 GGTTACGCGC 
5581 TTTCGCCTGC 
5941 GTGAAGGGCA 
6001 AATACGCAAA 
6061 GTTTCCCGAC 
6121 TTAGGCACCC 
6181 CGGATAACAA 
6241 GCCGCTGGAT 
6301 GAT6AGCAGT 



636 



6421 AGTGTCACAG 



648 
654 
660. 
666 
672 
678 
684 
690 
b'J- 
702 



AGAGAGGCCA 



AGCAAAGCAG 
AGCTCGCCCG 
CTC-GCCGTCG 
CCTTGCAGAA 
TTCCCAACAG 
AGCGGTGCC6 
CTCAAACTGG 
TACGGTCAAT 
TAATGTTGAT 
, TATTGGTTAA 
7081 ACGTTTACAA 
7141 TCAACCGGGG 
7201 GTTTGCTCCA 
7261 ACCCTCTCCG 
7321 ACTGTCTCCG 
7381 TTTAAAATAT 
7441 GCAAAAGTAT 
7501 GCTTTATTGC 



TTTAACGCCT 
GAAGCTTACT 
TGCATCAGCA 
CTCTCAGACC 
CTATCGCTAT 
GCAAGGTTAT 
AAAT6AAATT 
TTGCTCAGGT 
AGCAATCAGG 
CATCTGACGT 
ATTTTGATAT 
ATTATATTGA 
CTGGTGGTTT 
TTCGGGCAAA 
CCTCAAATGT 
TTTTA6ATAA 
TTGAGGGTTT 
GCTCTCAGCG 
CTTCTGCTGG 
CATTAAAGAC 
GTCAGAAGGG 
GTGAATCTGC 
CCATGAGCGT 
AGGCCGATAG 
TTGCTACAAC 
ATAAAAACAC 
TCCTGTTTAG 
CAACCATAGT 
AGCGTGACCG 
TGGGGCAAAC 
ATCAGCTGTT 
CCGCCTCTCC 
7GGAAAGCGG 
CAGGCTTTAC 
TTTCACACGC 
TGTTATTACT 
TGAAATCTGG 
AAGTACAGTG 
AGCAGGACAG 
ACTACGAGAA 
TCACAAAGAG 
TTTTA.CAACG 
TTCCCTTTCG 
TTGCGCAGCC 
CAAAGCTGGC 
CAGATGCACG 
CCGCCGTTTG 
GAAAGCTGGC 
AAAATGAGCT 
TTTAAATATT 
TACATATGAT 
GACTCTCAGG 



GCATTAA' 
GCCTTTC; 



TACAGGGTL 
TTAATTTGC ' 
I 20 



TATTTATCAC 
AAAATATATT 
TTTACATATA 
TATGATTTTG 
GTTTTCAAGG 
TCACTCACAT 
GTTAAATGTA 
AATTGAAATG 
CGAATCCGTT 
TAAACCTGAA 
GGTTGGTTCA 
TGAATTGCCA 
CTTTGTTCCG 
GGATTTAATA 
ATTATCTATT 
CCTTCCTCAA 
GATATTTGAG 
TGGCACTGTT 
TGGTTCGTTC 
TAATAGCCAT 
TTCTATCTCT 
CAATGTAAAT 
TTTTCCTGTT 
TTTGAGTTCT 
G6TTAATTTG 
TTCTCAAGAT 
CTCCCGCTCT 
ACGCGCCCTG 
CTACACTTGC 
CAGCGTGGAC 
GCCCGTCTCG 
CCGCGCGTTG 
GCAGTGAGCG 
ACTTTATGCT 
CAAGGAGACA 
C6CTGCCCAA 
AACTGCCTCT 
GAAGGTGGAT 
CAAGGACAGC 
ACACAAAGTC 
CTTCAACAGG 
TCGTGACTGG 
CCAGCTGGCG 
TGAATGGCGA 
TGGAGTGCGA 
GTTACGATGC 
TTCCCACGGA 
TACAGGAAGG 
GATTTAACAA 
TGCTTATACA 
TGACATGCTA 
CAATGACCTG 



ACGGTCGGJA 
TGAAAAAGTT 
GTTATATAAC 
ATAAATTCAC 
ATTCTAAGGG 
ATATTGATTT 
ATTAATTTTG 



ATCAGC 

CCCTTT 

ATGAGGGTTC TAAAAA . 
- — ' TAATGTTTTT 

TAATTC 

! 



T AATTCGC 
•"GTTTCTC 
"CTACGCA 
...""CCTTCCA 
TCATCTGATA 
CAAAATGATA 
CGAGTTGTCG 
GACGGCTCTA 
TTCCTTTCTA 
GTTCAGCAAG 
GCAGGCGGTG 
GGTATTTTTA 
TCAAAAATAT 
GTTGGCCAGA 
AATCCATTIC 
GCAATGGCT6 
TCTACTCAG6 
CGTGATGGAC 
TCTGGCGTAC 
GATTCCAACG 
TAGCGGCGCA 
CAGCGCCCTA 
CGCTTGCTGC 
CTGGTGAAAA 
GCC6ATTCAT 
CAAC6CAATT 
TCCGGCTCGT 
GTCATAATGA 
CCAGCCATGG 
GTTGTGTGCC 
AACGCCCTCC 
. ACCTACAGCC 
TACGCCTGCG 
i GGAGAGTGTT 
: GAAAACCCTG 
TAATAGCGAA 
ATG6CGCTTT 
. ICTTCCTGAG 
GCCCATCTAC 
, GAATCCGACG 
i CCAGACGCGA 
. AAATTTAACG 
, ATCTTCCTGT 
, GTTTTACGAT 
- ATAGCCTTTG 
ACGGTTGAAT 
TCTTTACCTA 
TATCCTTGCG 
GGTACAACCG 
CCTTGCCTGT 
! 40 



TTTCAAACCA 
TTCAC6CGTT 
CCAACCTAAG 
TATTGACTCT 
AAAATTAATT 
ATGTACTGTT 
TTTTCTTGAT 
CTCTGCGCGA 
CCGATGTAAA 
ATTTCTTTAT 
TAATTCAGAA 
ATCAGGAATA 
ATGTTACTCA 
AATTGTTTGT 
ATCTATTAGT 
CTGTTGATTT 
GTGATGCTTT 
TTAATACTGA 
ATGGCGATGT 
TGTCTGTGCC 
ATGTCCCTTT 
AGACGATTGA 
GCGGTAATAT 
CAAGTGATGT 
AGACTCTTTT 
CGTTCCTGTC 
AGGAAAGCAC 
TTAAGCGCGG 
GCGCCCGCTC 
AACTCTCTCA 
GAAAAACCAC 
TAATGCAGCI 
AATGTGAGTT 
ATGTTGTGTG 
AATACCTATT 
CCGAGCTCTT 
TGCTGAATAA 
AATCGGGTAA 
TCAGCAGCAC 
AAGTCACCCA 
CTAGAACGCG 
GCGTTACCCA 
GAGGCCCGCA 
GCCTGGTTTC 
GCCGATACGG 
ACCAACGTAA 
GGTTGTTACT 
ATTATTTTTG 
CGAATTTTAA 
TTTTGGGGCT 
TACCGTTCAT 
TAGATCTCTC 
ATCATATTGA 
CACATTACTC 
TTGAAATAAA 
ATTTAGCTTT 
ATGATTTATT 
! 50 



TTAAATTTAG 3900 
CTTTGTCTTG 3960 
CCGGAGGTTA 4020 
TCTCAGCGTC 4080 
AATAGCGACG 4140 
TCCATTAAAA 4200 
GTT 



r GTTTCA 4260 
"GTAACT 4320 
"ACTGTT 4380 
"GTTTTA 4440 
"AATCCZ 4500 
1U ,JGATAAT 4560 
AACTTTTAAA 4620 
AAAGTCTAAT 4680 
TGTTAGTGCA 4740 
GCCAACTGAC 4800 
AGATTTTTCA 4860 
CCGCCTCACC 4920 
TTTAGGGCTA 4980 
ACGTATTCTT 5040 
TATTACTGGT 5100 
GCGTCAAAAT 5160 
TGTTCTGGAT 5220 
TATTACTAAT 5280 
ACTCGGTGGC 5340 
TAAAATCCCT 5400 
GTTATACGTG 5460 
CGGGTGTGGT 5520 
CTTTCGCTTT 5580 
GGGCCAGGCG 5940 
CCTG6CGCCC 6000 
GGCACGACAG 6060 
AGCTCACTCA 6120 
GAATTGTGAG 6180 
GCCTACGGCA 6240 
CCCGCCATCT 5300 
CTTCTATCCC 6360 
CTCCCAGGAG 6420 
CCTGACGCTG 6480 
TCAGGGCCTG 6540 
TCACTTGGCA 6600 
AGCTTAATCG 6660 
CCGATCGCCC 6720 
CGGCACCAGA 6780 
TCGTCGTCCC 6840 
CCTATCCCAT 6900 
CGCTCACATT 6960 
ATGGCGTTCC 7020 
CAAAATATTA 7080 
TTTCTGATTA 7140 
CGATTCTCTT 7200 
AAAAATAGCT 7260 
TGGTGATTTG 7320 
AGGCATTGCA 7380 
GGCTTCTCCC 7440 
ATGCTCTGAG 7500 
GGATGTT 7557 
! 60 
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I 10 I 20 i 30 i 40 I 50 I 60 
I AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 
61 ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 
121 CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 
181 GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 
241 TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 
301 TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 360 
361 TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 
421 CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAACGA 480 
481 TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCA6 TATTGGACGC TATCCAGTCT 540 
541 AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 
601 GGTTTTTA.TC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 660 
661 AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 
721 ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 
781 TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 
841 CAATGATTAA. AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 
901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 
961 AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAG6TCA GCCAGCCTAT GCGCCTGGTC 1020 
1021 TGTACACCGT TCATCTGTCC TC1TTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 
1081 GTCTGCGCCT CGTTCCGGCT AAGTA.ACATG GA6CAGGTCG CGGATTTCGA CACAATTTAT 1140 
1141 CAGGCGATGA TACAAATCTC CGTTGTFT" TGTTTCGCGi TTGGTATAAT ( BCT G'GGT 1200 
1201 CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 126U 
1261 GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 
1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 1440 



144: 

150: 
156: 

162 



TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 
ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA G6CTCCTTTT GGAGCC T TTT 1560 
TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCC — " — 
TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAA' 



1681 TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGG; 



TTC 1620 
TCA 1680 
TGT 1740 



1741 CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 



216 

222 
228 
234 



GGCTCTGA GGGTGGCGGT 1860 
GAGTACGG TGATACACCT 1920 
CCGCCTGG TACTGAGCAA 1980 
CTTAATAC TTTCATGTTT 2040 
TATACGGG CACTGTTACT 2100 
GTATCATC AAAAGCCATG 2160 



TGGGTTCCTA TTGGGCFGC TATCCCTGAA AATGAGGGTG G T ! 
1861 TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CT 
ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATi 
AACCCCGCTA ATCCTAATC TTCTCTTGAG GAGTCTCAGC (Tl 
2041 CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG T T ' 

2101 CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTL 

"". TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 
GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 
GCTGGCGGCG GCTCTGGTGG TGGTTCTC-GT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 2340 
. ... GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 2400 
2401 GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 
2461 GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 2520 
2521 GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 
2581 GGTGATTTTG CTGGCTCTAA TTC CCA A. AT G GCTCAAGTCG GTGACGGTGA TAATTCACCT 2640 
2641 TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 2700 
2701 TTTGTCTTTA GCGCTG6TAA ACCATATGAA TTTTCTATTG ATTGT6ACAA AATAAACTTA 2760 
2761 TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 2820 
2821 TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTTTG GGTATTCCGT 2880 
2881 TATTATTGCG TTTCCTCGGT TTu.TTCTGh TAACTTTGTT CGGCTATCTG CTTACTTTTC 2940 
2941 TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTAT'ATTG 3000 
3001 GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTC T GACT 3060 
3061 TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 
3121 TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG 3180 
3181 ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG 5240 
3241 CTCGTTAGCG TTGGTAAGAT TCA GG ATA A A ATTGTAGCTG G6TGCAAAAT AGCAACTAA" 1 " 3300 
3301 CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360 
3361 CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTA FGAT 3420 
3421 TCCTACGATG AAAATAAAAA CGGCTTGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAA" 34«U 
3481 ACCCGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 3540 
3541 AAATTAGGAT GGGATATTAT TTTTCTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGCG 3600 
3601 CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAA1 TACTTTACCT 3660 
— — ~ —TACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 
TCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 



3841 TCCGGTGTTT ATTCTTATTT AACGCC" 
3901 AATTTAGGTC AGAAGATGAA GCTTAC" 



3661 TTTGTCGGTA CTTTATATTC TCTTAT" 

3721 GTTGGCGTTG TTAAATATGG CGATTC" „ — ~ ~- 

3781 ACTGGTAAGA ATTTGTATAA CGCATAJGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 



TAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 

, _ AAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 3960 

3961 TGTCTTGCGA TTGGATTTGC ATCAGCwTTT ACATATAGTT ATATAACCCA ACTTAAGCCG 4020 
4021 GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 4080 
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"ATTCTTACG 
rACTGGTCGT 
"CAAAATGTA 
"CTGGATATT 
"ACTAATCAA 



5341 CGGTGGCCTC 
5401 AATCCCTTTA 
5461 ATACGTGCTC 
5521 GTGTGGTGGT 
5581 TCGCTTTCTT 
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612: 
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GGCGCCCAAT 
ACGACAGGTT 
TCACTCATTA 
TTGTGAGCGG 
TACGGCAGCC 
6301 GCCATCTGAT 
6361 CTATCCCAGA 
CCAGGAGAGT 
GACGCTGAGC 
GGGCCTGAGC 
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CTGTTTACCC 
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6781 CCCtCCTCCA 
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TCTTCTTTTG 
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GTATATTCAT 
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ATATTGATTG 
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GTTTTATCTT 
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GGTATTTCCA 
ACCAGCAAGG 
AGAAGTATTG 
ACTGATTATA 
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TCGCTA' 
AGGTTA' 
TGAAAT" 
CTCAGG" .... 
AATCAGGCGA 
CTGACGTTAA 
TTGATATGGT 
ATATTGATGA 
GTGGTTTCTT 
GGGCAAAGGA 
CAAATGTATT 
TAGATAACCT 
AGGGTTTGAT 
CTCAGCGTGG 
CTGCTGGTGG 
TAAAGACTAA 
AGAAGGGTTC 
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CCATAGTAC6 
GTGACCGCTA 
CTCGCCACGT 
CGATTTAGTG 
AGTGGGCCAT 
AATAGTGGAC 
GATTTATAAG 
GGCAAACCAG 
AGCTGTTGCC 
CCTCTCCCCG 
AAAGCGGGCA 



TTCAAGGATT 
CTCACATATA 
AAATGTA.ATT 
TGAAAIGAAI 
ATCCGTTATT 
ACCTGAAAAT 
TGGTTCAATT 
ATTGCCATCA 
TGTTCCGCAA 
TTTAATACGA 
ATCTATTGAC 
TCCTCAATTC 
ATTTGAGGTT 
CACTGTTGCA 
TTCGTTCGGT 



CTAAGG6AAA 
TTGATTTATG 
A.ATTTTGTTT 
AATTCGCCTC 
GTTTCTCCCG 
CTACGCAATT 
CCTTCCATAA 
TCTGATAATC 
AATGATAATG 
GTTGTCGAAT 



ATTAATTAAT 4140 
TACTGTTTCC 4200 
TCTTGATGTT 4260 
TGCGCGATTT 4320 
ATGTAAAAGG 4380 
ICTTTATTTC 4440 
TTCAGAAGTA 4500 
AGGAATATGA 4560 
TTACTCAAAC 4620 



GC1 

CACACGCCAA 
TATTACTCGC 
AATCTGGAAC 
TACAGTGGAA 
AGGACAGCAA 
ACGAGAAACA 
CAAAGAGCTT 
TACAACGTCG 
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GGATTTTGCC 
CGTGGACCGC 

rGTnTGfTG 

CGCGTTGGCC 
GTGAGCGCAA 



GGAGACAGTC 
TGCCCAACCA 
TGCCTCTGTT 
GGTGGATAAC 
GGACAGCACC 
CAAAGTCTAC 
CAACAGGGGA 
TGACTGGGAA 
AGCACTATTG 
ACCAAGGGCC 
GCGGCCCTGG 
CAGGCGCCCT 
ACTCCCTCAG 
GCAACGTGAA 
GTACTAGTGG 
ACCCTGCTAA 
C1TGGGCTAT 
TTACGAGCAA 
GTTGCGCAGC 
GGAAAGCTGG 
GCAGATGCAC 
TCCGCCGTTT 
TGAAAGCTGG 
AAAAATGAGC 
ATTTAAATAT 
GTACATATGA 
AGACTCTCAG 



GGCATTAA' 
GGCCTTTC 
TATGAGGG' 
TTACAGGG' 



GGCTCTAA' 
CTTTCT AC- 
CAGCAAGG" . 
GGCGGTGTTA 
ATTTTTAAJG 
AAAATATTGT 
GGCCAGAATG 
CCATTTCAGA 
ATGGCTGGCG 
ACTCAGGCAA 
GATGGACAGA 
GGCGTACCGT 
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GACGGTTTTT 
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GTGAAAAGAA. 
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ATAATGAAAT 
GCCATGGCCG 
GTGTGCCTGC 
GCCCTCCAAT 
TACAGCCTCA 
GCCTGCGAAG 
GAGTGTTCTA 
AACCCTGGCG 
CACTGGCACT 
CATCGGTCTT 
GCTGCCTGGT 
GACCAGCGGC 
CAGCGTGGTG 
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GGCTTCTTAA 
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ATAATGTTTT 
CTAATTCTTT 



TGTTTG" 
TATTAG" 
TTGATT 
ATGCTT 



AAA 4680 

TGT 4740 

GCC 4800 

AGA 4860 



.iTGL. . ,. _.. 

ATACTGACCG 4920 
GCGATGTTTT 4980 
CTGTGCCACG 5040 
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